Skip to content

[Proposal] more permissive definition of allowed symbol syntax #296

@terefang

Description

@terefang

currently the definition is as in https://htmlpreview.github.io/?https://github.com/jgm/djot/blob/master/doc/syntax.html#symbols

which says:

Surrounding a word with : signs creates a “symbol,” which by default is just rendered literally ...

so the implementation is highly parser and renderer specific.

pSymbol = do
  asciiChar ':'
  bs <- byteStringOf $ skipSome (skipSatisfyByte
                                    (\c -> c == '+' || c == '-' ||
                                         (isAscii c && isAlphaNum c)))
  asciiChar ':'
-- 58 = :
    [58] = function(self, pos, endpos)
      local sp, ep = bounded_find(self.subject, "^%:[%w_+-]+%:", pos, endpos)
      if sp then
        self:add_match(sp, ep, "symbol")
        return ep + 1
      else
        self:add_match(pos, pos, "str")
        return pos + 1
      end
    end,

so it seems that as least comparing the Haskell and Lua implementations there is some disagreement

Proposal

Use-Cases

XML/HTML Renderer

HTML Entity

  • :apos: to be rendered as &apos;
  • :Euro: to be rendered as &Euro;

HTML Entity Code Point

  • :#60: to be rendered as &#60; or &lt;
  • :#x2014: to be rendered as &#x2014; or &#8212; or &mdash; or "—"

Icon Font Glyph Name

  • :fa-bars: to be rendered as <i class="fa fa-bars"></i>
  • :fa+fa-bars: to be rendered as <i class="fa fa-bars"></i>

This may be subject to that actual implementation and/or configuration of the html-renderer backend.

Icon or Symbol Font Glyph Name for PDF, Image, or Unicode Text Renderer

  • :a19: to be rendered as "✓" (from Zapf Dingbats font) -- (U+2713 CHECK MARK &check;, &checkmark;)

Possible Variation

it might be desirable to clearly separate the verbatim html entities from glyph names by using a prefix for indication.

  • :*a19: to be rendered as "✓" (from Zapf Dingbats font) -- (U+2713 CHECK MARK &check;, &checkmark;)

Possible candidates would be : ^, !, &, $, %, /, =, ?, +, ~, *, #

So a possible syntax could be in perl-style regular expression:

/^[\^\!\&\$\%\/\=\?\+\~\*\#]?[[:alnum:]\.\_\-\+]+$/

Graceful Fallback Mechanism

where a renderer backend might not be able to recognize which symbol or glyph to actually render it might fallback i the following ways:

pure text style

  • :apos: to be rendered as :apos:
  • :Euro: to be rendered as :Euro:
  • :#60: to be rendered as :#60:
  • :#x2014: to be rendered as :#x2014:
  • :fa-bars: to be rendered as :fa-bars:
  • :fa+fa-bars: to be rendered as :fa+fa-bars:
  • :a19: to be rendered as :a19:
  • :*a19: to be rendered as :*a19:

html style

  • :fa-bars: to be rendered as <code>:fa-bars:</code>
  • :fa+fa-bars: to be rendered as <code>:fa+fa-bars:</code>
  • :a19: to be rendered as <code>:a19:</code>
  • :*a19: to be rendered as <code>:*a19:</code>

Entities that should always be recognized

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions