Skip to content

Conversation

jchavarri
Copy link
Member

The melange ppx transforms js quoted strings into *j for some reason. This is very confusing for any ppxs authors that want to build upon these quoted strings (like melange-react-intl), because they have to target the *j one. cc @ixzzd

@anmonteiro anmonteiro force-pushed the ppx-strange-quoted-strs-transform branch from 71694d7 to 96d0d2c Compare December 18, 2023 19:35
@anmonteiro
Copy link
Member

Why is this happening?

  • when processing strings with the js delimiter, the Melange PPX escapes the string.
  • when processing strings with the j delimiter, the Melange PPX escapes the string and processes the interpolation expressions, if any.

In both cases, it replaces the original string delimiters with the private *j delimiter, as you report in this PR. This signals, to Melange, that the string has been escaped and doesn't need further processing.

Discussion

  • Previously, the Melange PPX would run last; other PPXes would be able to hook on j or js delimiters to apply their own processing.
  • After we split the builtin PPX from the core compiler in [melange.ppx]: move the entire PPX frontend to melange.ppx #583, we rely on ppxlib's order, which is unspecified. I believe that explains what you're seeing in melange-react-intl.

Solutions

A few alternatives that come to mind:

  1. move just the utf8 string preprocessing back to the compiler core. This guarantees that utf8 strings are processed last, allowing PPXes to hook on j and js again
  2. Document that Melange processes delimiters into escaped strings which get the *j delimiter to mark the escaping
    • This has a downside: it doesn't allow PPXes to produce strings to be escaped if the Melange PPX has already run
  3. Remove the *j delimiter entirely. Assume that the Melange PPX only runs once, and successive runs of the PPX further escape potentially already escaped strings
    • This has the same downside as 2. and, additionally, doesn't indicate to the external PPX that a string has already been escaped
  4. Introduce a second PPX just for string interpolation that users have to opt into, and make force it to run during another preprocessing phase.

@jchavarri
Copy link
Member Author

This signals, to Melange, that the string has been escaped and doesn't need further processing.

That ^ sentence suggests the Melange PPX could run twice for the same AST node. How could that happen? What does "further processing" mean?

we rely on ppxlib's order, which is unspecified.

From what I understand, the melange PPX can ran after some other PPXs (like it did in the past) or before other PPXs (as the situation with melange-react-intl shows). So assuming ordering doesn't matter, and assuming the Melange PPX only runs once per AST node, I would vote for option 3: remove the *j delimiter and let things flow naturally (I guess until we find some unknown issues down the line 😅 )

@ixzzd
Copy link

ixzzd commented Dec 19, 2023

thanks for the detailed explanation @anmonteiro

Assume that the Melange PPX only runs once, and successive runs of the PPX further escape potentially already escaped strings

I am wondering in which scenarios the Melange PPX could be run multiple times?

@anmonteiro
Copy link
Member

I would vote for option 3: remove the *j delimiter and let things flow naturally

I don't think this is a good option – re-reading my message, the biggest downside isn't immediately obvious: that strings generated by a PPX in the middle won't be escaped if the melange ppx has already run. This is already currently a problem in any case.

If not too invasive, I believe option 1. might actually be the way forward here.

@anmonteiro
Copy link
Member

I am wondering in which scenarios the Melange PPX could be run multiple times?

I don't know if it will in normal situations, but you can always run melc -ppx melppx -ppx melppx x.ml and the PPX will be applied twice. I believe it's beneficial for the PPX to be idempotent, though I'm unsure that we preserve such invariant for all the transformations.

@jchavarri
Copy link
Member Author

but you can always run melc -ppx melppx -ppx melppx x.ml and the PPX will be applied twice

imo we should not make downstream PPX author's life harder because of such an edgy case.

strings generated by a PPX in the middle won't be escaped if the melange ppx has already run

Another option (I guess 5?) would be to expose the functionality of escaping these strings as a library, so that PPXs don't need to rely on ordering if they somehow generate this kind of elements, and can apply that transformation directly.

@anmonteiro
Copy link
Member

We also use the *j internal marker to decide whether we need to escape (OCaml) strings or simply emit unicode JS strings.

#995 moves the transformation to the compiler core, which a) implements 1. and b) doesn't expose *j to users of Melange or PPX developers anymore.

@anmonteiro anmonteiro force-pushed the ppx-strange-quoted-strs-transform branch from 96d0d2c to 224bfa7 Compare December 21, 2023 19:43
@anmonteiro anmonteiro merged commit 9a1e64d into main Dec 21, 2023
@anmonteiro anmonteiro deleted the ppx-strange-quoted-strs-transform branch December 21, 2023 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants