Skip to content

Please recognize U+200B (ZERO WIDTH SPACE) as a space character for org-reader syntax. #6070

@kawabata

Description

@kawabata

(Checcked with Pandoc 2.9.1 Windows version to reproduce this problem)

In org-mode, characters that match with `org-emphasis-regexp-components' is considered to be space character for emphasis markup. (cf. DOCSTRING of this variable.)

For example, in markup text " *emph* ", "emph" is "emphasized" because it is surrouneded by "*" and space character " ". Among space characters, U+200B (Zero width space) is important because it enables org-mode to emphasize the component of the word. (e.g. " *emph*asize"). However, pandoc org-mode reader does not recognize this as space character, so that in "\u200b*emph*\u200b", "emph" is not emphasized.

I wish if it could be fixed so that the versatility of pandoc & org-mode can be greately extended.

Regards,

Reproduction.

For text "​*abc*" (first character is U+200B),

% pandoc -f org -t json -o test.json --standalone test.org

will produce :

{"blocks":[{"t":"Para",
            "c":[{"t":"Str",
                  "c":"​\*abc*"}]}],
 "pandoc-api-version":[1,20],"meta":{}}

It is desired to produce something like

{"blocks": [{"t":"Para",
             "c": [{"t":"Str",
                    "c":"​"}, // U+200B
                   {"t":"Strong",
                    "c":[{"t":"Str",
                          "c":"abc"}]}]}],
 "pandoc-api-version":[1,20],"meta":{}}

instead.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions