Skip to content

poc: typst output properties #9623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

gordonwoodhull
Copy link
Contributor

@gordonwoodhull gordonwoodhull commented Apr 2, 2024

We at Quarto would like to contribute a mechanism to Pandoc for Typst property output, allowing translation of CSS attributes/properties to Typst properties using Lua filters.

This is a proof of concept, partial implementation.

The Typst writer searches for attributes with names of the form typst:target:attr

Where target is element if the attribute should go to the current element, or text if the content should be wrapped in a text element with this attribute.

It is assumed that the value is raw Typst code suitable for insertion as a property value, e.g. strings should be quoted and markup should be bracketed.

The following cases are implemented:

  • cell element
  • table text
  • block element
  • span text

To be complete, each element which receives attributes would need to process both kinds of attribute, and generate the text element if needed. So a complete implementation might include a couple dozen more such cases.

An example Lua filter offering partial translation of <table> font-family, font-size, and <td>/<th> .color, background-color, and padding-*, is available here:

https://gist.github.com/gordonwoodhull/f6ea7d4b8a462da83ad90504a65bf3fe

There are various design possibilities, but this seems the simplest and most general, as it would allow round-tripping of Typst properties, and translation from Typst properties to CSS (if anyone wants those features in the future).

All suggestions are welcome!

cc @cscheid @tarleb

This is a proof of concept of output properties for Typst, allowing
Lua filters to translate CSS attributes/properties to Typst properties.

The Typst writer searches for attributes of the form typst:target:attr

Where target is "element" if the attribute should go to the element
or "text" if the content should be wrapped in a text element with this
attribute.

It is assumed that the value is raw Typst code suitable for insertion
as a property value, e.g. strings should be quoted and markup should
be bracketed.

The following cases are implemented:
- cell element
- table text
- block element
- span text

To be complete, each element which receives attributes would need to
process both kinds of attribute, and generate the text element if needed.
@cscheid
Copy link
Contributor

cscheid commented Apr 2, 2024

Just to add a bit more context:

In Quarto, we use the HTML reader to parse table elements and convert them to native Pandoc nodes. Now that Pandoc has support for "fancy" table attributes like rowspans and colspans, we've found that HTML is an attractive format for specifying tables in general, not only for HTML input/output. This is particularly true for tables emitted by code. As a result, we can leverage Pandoc's reader to allow libraries to emit rich HTML table input, and produce output in Docx, PDF, HTML, etc.

The feature in the PR would provide a path for table styling to be retargeted to Typst from HTML input in Quarto, but we think it would work equally well in pure Pandoc.

We've given a fair amount of thought to how the styles should be specified, and our decision to use attributes and special names was based on the way that the HTML writer treats HTML5 attributes differently from regular attributes:

| html5
= if x `Set.member` (html5Attributes <> rdfaAttributes)
|| T.any (== ':') x -- e.g. epub: namespace
|| "data-" `T.isPrefixOf` x
|| "aria-" `T.isPrefixOf` x
then (customAttribute (textTag x) (toValue y) :)
else (customAttribute (textTag ("data-" <> x)) (toValue y) :)

We felt that a simple prefix like typst: was a clean way to indicate that these attributes have special behavior in the Typst writer, without adding any fundamental concepts to the rest of the Pandoc codebase. In addition, it allows (a future version of) the Typst reader to parse Typst source with this attributes, and produce AST that could preserve styling on a Typst reader -> Typst writer round trip.

@gordonwoodhull
Copy link
Contributor Author

gordonwoodhull commented Apr 3, 2024

I think the implementation can be cleaned up with helper functions to conditionally generate #text(...){...} wrappers, etc.

I counted 15 elements with attributes in the Pandoc documentation.

Glad to iterate on this.

@jgm
Copy link
Owner

jgm commented Apr 3, 2024

I'm not really understanding the motivation for this: can you explain more fully? Is the idea to allow non-structural features of tables (e.g. cell coloring) to be transmitted from HTML to typst? Can you give an example or two?

@gordonwoodhull
Copy link
Contributor Author

gordonwoodhull commented Apr 3, 2024

Yes, that is exactly the purpose.

I specifically targeted this "test pattern" example of gt in the Lua code and cases:

https://gt.rstudio.com/reference/data_color.html?q=color#foreground-text-and-background-fill

Here is the Typst output, pretty close except it chose a different font from the list:
image

@jgm
Copy link
Owner

jgm commented Apr 3, 2024

So let me see if I understand correctly. The pandoc reader will parse an HTML table and include attributes, including style attributes, since the AST allows attributes on table cells. The Lua filter looks at these style attributes and adds corresponding typst: attributes. The typst writer adds these to table cells, if present.

@cscheid
Copy link
Contributor

cscheid commented Apr 3, 2024

Your understanding looks accurate to me.

Just to add a bit: it's not only Lua filters that would benefit. This would work well with the following scenarios in addition to Lua filters:

  • JSON filters
  • pandoc -f native
  • echo "[span]{typst:key=value}" | pandoc -f markdown

Anyone targeting Pandoc's AST can emit attributes in a way that the Typst writer knows how to emit accurately, in the same way that the HTML writer knows how to differentiate HTML5 attributes from other attributes when it emits data-$ATTR="value" vs $ATTR="value".

@jgm
Copy link
Owner

jgm commented Apr 4, 2024

I see the utility of this, and it doesn't seem too harmful. It wouldn't really affect anyone who didn't explicitly add typst:key attributes.

@gordonwoodhull
Copy link
Contributor Author

gordonwoodhull commented Apr 4, 2024

Great! I will work up a PR for the whole feature, with tests.

The only side effect currently is that if you add these attributes and then render to html, they are emitted. I checked and these double-colon attributes are valid XML/HTML, and of course the browser ignores them. But you wouldn't add them unless you meant to output to Typst.

Typst output should be identical if no typst:key attrs are set.

@jgm
Copy link
Owner

jgm commented Apr 4, 2024

I don't think it's bad that they are emitted in HTML/XML, as long as pandoc doesn't itself add them. (They will be irrelevant to anyone who doesn't go to the trouble of adding them explicitly with a filter.)

I'm envisioning for now that this is just affecting the typst writer, not the reader. I'd have more qualms about it on the reader (partly because these weird attributes would be transmitted to HTML).

Comment on lines +240 to +243
let (textstart, textend) =
(case formatAttrs $ pickTypstTextAttrs tabkvs of
[] -> ("", "")
tkvs -> ("#text" <> parens (literal (T.intercalate ", " tkvs)) <> "[", "]"))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More idiomatic Haskell would define a function, rather than textstart and textend.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also I don't really understand this part: why is a new #text being added? Shouldn't any typst attributes attached to the table go in table()?

contents <- blocksToTypst blocks
return $ "#block[" $$ contents $$ ("]" <+> lab)
return $ "#block" <> props <> "[" $$ contents $$ ("]" <+> lab)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here it's more idiomatic to use the doclayout function that puts things in brackets.

@jgm
Copy link
Owner

jgm commented Apr 4, 2024

I'm not sure I understand the motivation for the typst:element:key vs typst:text:key distinction, which seems a bit of a wart; it would be better if we could have something simpler.

@gordonwoodhull
Copy link
Contributor Author

gordonwoodhull commented Apr 4, 2024

Thanks for the feedback!

Yes, I would also prefer not to have the text(…)[…] cases and the two namespaces. But as far as I can tell text is a pseudo element with no effect on layout, and it is the only way to set text properties.

For example, fill on text specifies the color of the text, but fill on a block element sets the background.

Each kind of element has its own namespace of properties? Vs CSS having a global namespace? I’m not entirely sure, tbh - wish they had a white paper which explained how it all works.

@jgm
Copy link
Owner

jgm commented Apr 4, 2024

I see. You could probably handle this by adding a set rule for text inside the element in question. That might be more elegant than enclosing the whole thing in a text.

#table(table.cell[#set text(stroke: red);red text],table.cell[back to black])

@jgm
Copy link
Owner

jgm commented Apr 4, 2024

Might be worth considering something like

typst:cell:color=red
typst:cell:text:color=red

instead of

typst:element:cell:color=red
typst:text:cell:color=red

That makes more sense to me (particularly if it's implemented as suggested above, with a set inside the affected element's content, rather than outside the element).

@gordonwoodhull
Copy link
Contributor Author

I like the set-rule and less nesting. I agree that is more idiomatic.

There are a couple of places where this isn't possible:

  • A span will not be explicitly emitted (only its contents) unless it has typst:text, so I think that still needs to be a text element and not a set-rule.
  • For elements with a lot of children, like a table, it is more terse and therefore I think preferable to wrap the table with a text element, rather than putting a set-rule inside each cell.

I would always naturally prefer the terse and less repetitive option, even at the cost of inconsistency. But I'm open to ideas and glad to iterate.

@gordonwoodhull
Copy link
Contributor Author

gordonwoodhull commented Apr 5, 2024

I'm very flexible about the naming of attributes. I do want to clarify that the current scheme is always 3-part and doesn't include the specific element name.

E.g.
typst:element:color
typst:text:color

So element stands in for "whatever element this html element translates to". And text means "put this attribute in a text element/set-rule at this element".

But using the specific element name is a possibility.

I think I'll know better whether there are other naming considerations when I've worked through a few more cases. The separate namespace for text properties was a surprise and I wonder if there are other surprises.

@jgm
Copy link
Owner

jgm commented Apr 5, 2024

typst:element:color
typst:text:color

OK, thanks for the clarification. In that case what about using typst:color instead of typst:element:color?

@gordonwoodhull
Copy link
Contributor Author

Sure!

@gordonwoodhull
Copy link
Contributor Author

Current draft in #9648

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants