-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Implement pandoc output (and thus indirectly Markdown, HTML, ePub, docx output) #461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Naive question, but wouldn't this be better suited as a reader in pandoc? It already has a typst writer. |
@Enivex A fair question from a user perspective, but having looked at how this is accomplished I can say no, that this cannot be done from the Pandoc end. Pandoc could implement a reader that would convert the input syntax from one form to another, but it would not evaluate it. This method is basically using the typst internals to iterate through the document and evaluate/expand/run everything, then at the last second rather than outputting shapes to the PDF it dumps strings to a JSON object. The exported result will not be equivalent to the input it will be a closer equivalence of the output. Rather clever actually, and something that has to be done from inside Typst (if at all), not from an outside reader. |
As I implemented it it's not exactly "at the last second", but after the evaluation and before the layouting phase (after step 2 in ARCHITECTURE.md. The layouting phase is what defines the exact pagebreaks and locations within the pages which doesn't really make sense for digitally viewed documents. I've found two things from the layouting phase that would make sense to have in this output: the ListBuilder and the SmartQuotes handling. Not sure why those are part of layouting. The listbuilder is especially weird because you can both explicitly declare a Other than that I'd say @alerque is pretty correct, if pandoc wanted to have an integrated reader for typst it would have to either reimplement all of typst (including the language interpreter) in haskell or it would work on a syntax level and ignore any custom functions (as the latex reader in pandoc does). |
Thanks for the work! From a product perspective, however, I think HTML (and other formats) eventually generated this way does not have the same look as the PDF, and this will put Typst at an inferior position in comparison with KaTeX and MathJAX, because the math equations/expressions rendered from the latter two look almost exactly the same as PDF from LaTeX (of course, KaTeX and MathJAX are not strictly the same thing as LaTeX, but they suffice for the most part). The public impression of this inferiority could stick, negatively impacting the goal of being an alternative for LaTeX. I think if Typst wants to do HTML or any other formats, it'd be well-wised to do them "the right way", instead of having a not-there-yet-but-kinda-works interim solution. |
Thanks for this work! Not to mention the ability to support all other Pandoc output formats.
Would MathML be an option here? |
I'm not sure what exactly you mean by "inferior output", but Pandoc does actually support outputting all options from Unicode math, KaTeX, MathJAX, and MathML formats when outputting HTML. What's missing in this PR is parsing the math expressions at all, but that's not really related to outputting pandoc AST instead of HTML directly - you'll still need to figure out what to render them to regardless. Even if you want to render them as PNG or SVG you could still do that with the pandoc intermediary format. The advantage of outputting that intermediate AST is just that it allows many output formats with the same effort as just supporting HTML would be directly. I actually wrote my masters thesis in markdown with pandoc (plus my blog, many other papers, presentations, ...). Here's a screenshot from the HTML version (with uses MathJAX for the math) and here's a screenshot from the pdf output via LaTeX |
Hi @phiresky, thanks for the response.
By "inferior" I meant this in my original post
My concern boils down to: given any modestly complex math equation, is this Typst+Pandoc route able to generate an output that can be, with appropriate configs, eventually rendered into HTML that is of the same quality as the PDF? By "same quality as the PDF", I'm having KaTeX and MathJAX as examples (ignoring edge cases). If this answer to the above question is ye, i.e. Typst+Pandoc+(appropriate configs) is on par with KaTeX/MathJAX, then my concern is resolved, and please forgive me for my unawareness of Pandoc's prowess :) Example: I wish a pretty equation like [1] in PDF can look just like [1] in HTML (with appropriate configs), instead of something like [2] at best. |
I've posted some thoughts in jgm/pandoc#8740. |
The plan of action is to rework the styling implementation a bit, which makes it easier to create good native HTML export and that can then be used as input to pandoc for generating things like docx. In contrast to the approach used in this PR, this will also correctly export lists and things affected by show rules. Still, thanks for your work on this! |
This (kinda dirty) PR implements outputting a pandoc JSON AST from the CLI instead of PDF.
That means it indirectly allows outputting all kinds of formats such as Markdown, HTML, ePub, docx.
The pandoc AST is not infinitely powerful so this conversion has a fair bit of information loss, but in exchange it gives access to many different output formats without specific implementations.
For example this typst document:
run via these commands:
results in this:
markdown
The exact output format can be controlled within pandoc, for example to prevent it from outputting standard commonmark (without fenced divs etc) use
pandoc -t commonmark
HTML
docx
screenshot:

Issues
fixable:
probably unfixable: