-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
I have a HTML file (pasted at https://pastebin.com/PxCE56km), which I want to convert to markdown, stripping all HTML. This is an external file and I have no control on how it's been created. Clearly, the HTML was created by MS Word, but I don't know why each word is in a separate span; this is a very strange format.
When I run
pandoc -f html-raw_html-native_divs-native_spans -t markdown --wrap none pastebinfile.html
Pandoc crashes with the message:
pandoc: renderList encountered Empty CallStack (from HasCallStack): error, called at src/Text/DocLayout.hs:453:20 in doclayout-0.4-f7fb32dda74e7b589442abd36f03761b96d0a38d97150025d0aa5f3d7a4731b4:Text.DocLayout
My version of Pandoc is
/usr/bin/pandoc -vv
pandoc 3.1 Compiled with pandoc-types 1.23, texmath 0.12.6, skylighting 0.13.2.1, citeproc 0.8.1, ipynb 0.2