Skip to content

Conversation

mkoeppe
Copy link
Contributor

@mkoeppe mkoeppe commented Apr 13, 2023

📚 Description

Docstrings formatted for the terminal, in uses like CliffordAlgebra?, are difficult to read when heavy LaTeX markup is used.
We add some mappings that replace LaTeX commands by Unicode characters.

Resolves #35491

📝 Checklist

  • The title is concise, informative, and self-explanatory.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.
  • I have created tests covering the changes.
  • I have updated the documentation accordingly.

⌛ Dependencies

@tscrim
Copy link
Collaborator

tscrim commented Apr 13, 2023

I agree with this, but I forget what our policy is on assuming that users have a terminal that supports unicode.

Also, what about the other (more common) Greek letters? It would look quite weird to only have some of them changed to unicode.

@tobiasdiez
Copy link
Contributor

At JabRef we have made good experiences with the small latex2unicode library. It's written in Scala, so not reusable here but the encoding map is rather complete: https://github.com/tomtung/latex2unicode/blob/master/src/main/scala/com/github/tomtung/latex2unicode/helper/Escape.scala

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Apr 13, 2023

I forget what our policy is on assuming that users have a terminal that supports unicode.

I don't think we have a policy for this formulated anywhere. The sage.tensor and sage.manifolds packages already use Unicode characters, and I don't think we have heard complaints about that.

I wouldn't be concerned about terminal capabilities. I'd think the most plausible compatibility issue would be (1) workflows in which users copy-paste terminal output into a LaTeX document, without loading the necessary packages for input and font configuration; or (2) programs that run Sage as a subprocess.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Apr 13, 2023

At JabRef we have made good experiences with the small latex2unicode library. It's written in Scala, so not reusable here but the encoding map is rather complete: https://github.com/tomtung/latex2unicode/blob/master/src/main/scala/com/github/tomtung/latex2unicode/helper/Escape.scala

Thanks for the pointer! Looking great. I think somewhere in IPython/Jupyter there also must be a package that contains such mappings already, for offering tab-completion with the latex names.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Apr 13, 2023

what about the other (more common) Greek letters? It would look quite weird to only have some of them changed to unicode.

Yes, it's not complete, of course; this is just a mockup that makes CliffordAlgebra?, ExteriorAlgebra? and some tensor docstrings look good.

Also, because the substitutions are regex-based, there is a potential for breakage from unintended matches. For example, so far I have shied away from handling the \ (explicit space) command. So I think we need some kind of systematic testing to avoid unwelcome surprises.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Apr 13, 2023

And finally, it seems to me that this is something that should be taken care of in a more reusable way, perhaps a Sphinx extension. Mildly related:

@tscrim
Copy link
Collaborator

tscrim commented Apr 14, 2023

what about the other (more common) Greek letters? It would look quite weird to only have some of them changed to unicode.

Yes, it's not complete, of course; this is just a mockup that makes CliffordAlgebra?, ExteriorAlgebra? and some tensor docstrings look good.

The unfortunate side-effect is that it makes any doc that only does part of it look very broken. This is a good proof-of-concept right now, but I think it would be hard to convince people that our doc formatter is not horribly broken without doing certain subsets (e.g., all Greek letters).

Also, because the substitutions are regex-based, there is a potential for breakage from unintended matches. For example, so far I have shied away from handling the \ (explicit space) command. So I think we need some kind of systematic testing to avoid unwelcome surprises.

Indeed, that might be a hard one to deal with. I think we would be better of replacing our docstrings with something like \quad that is easier to identify (or at least far less likely to be a misidentified).

@jhpalmieri
Copy link
Member

At JabRef we have made good experiences with the small latex2unicode library. It's written in Scala, so not reusable here but the encoding map is rather complete: https://github.com/tomtung/latex2unicode/blob/master/src/main/scala/com/github/tomtung/latex2unicode/helper/Escape.scala

Thanks for the pointer! Looking great. I think somewhere in IPython/Jupyter there also must be a package that contains such mappings already, for offering tab-completion with the latex names.

I found these but haven't looked at them in any detail:

Copy link

github-actions bot commented Mar 6, 2024

Documentation preview for this PR (built with commit e17b688; changes) is ready! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sage.misc.sagedoc: Change math_substitutes to Unicode
4 participants