-
-
Notifications
You must be signed in to change notification settings - Fork 115
fix: Load Sphinx inventories whose dispname spans multiple lines #748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if len(line) == 0 and len(items) == 0: | ||
# Skip empty lines at start of inventory section | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit just allows inventory files without any items to be loaded, which is used in the round trip test; it can be removed without any dramas, but the empty inventory will also need to be removed from the round trip test (but not the Sphinx loading test).
Hi @Yoshanuikabundi, thanks for the PR and detailed explanation! Looking at Sphobjinv's docs (from @bskinn):
...it seems newlines are indeed valid inside Out of curiosity @Yoshanuikabundi, could you show some example |
My first reaction: I believe this is >incorrect<, and in fact newlines are invalid anywhere in any field in a v2 I would have to dig back into the Sphinx code to be sure, but I'm pretty sure that Sphinx processes the decoded
I wrote all of of the higher-level components of the My opinion here is that This seems to me to be an XY problem. IMO, In any event, I would guess that the best solution to this documentation need is a Sphinx extension of some kind. I found the mismatching objects in the OpenEye Toolkits >>> import sphobjinv as soi
>>> from pathlib import Path
>>> cbdata = Path('objects.inv').read_bytes()
>>> bdata = soi.decompress(cbdata)
>>> tdata = bdata.decode()
>>> [l for l in tdata.splitlines()[4:] if not soi.re.p_data.match(l)] # [4:] skips the header
... list of syntax-mismatching lines ... I then exported the decompressed contents to disk with: >>> Path('objects-manual.txt').write_text(tdata) And searched for the mismatching lines. Here are some examples of full entries with internal newlines from that same
|
One thing this discussion points to for The invalidity of newlines in any field is implicit in the use of "data line" in this sentence on that page (emphasis added):
But the description could use a more direct statement of the strict limitation of each data item being constrained to a single line of the decompressed inventory source. |
Thank you @bskinn! My initial reaction was similar: even if newlines were valid, how they are rendered depend on the output format (HTML, Markdown, PDF, etc.) and so I wasn't sure it made sense to support them. Thanks a lot for searching for and finding examples in the mentioned inventories. These "display names" definitely look like descriptions, not names, so I'd be inclined to say they are invalid. In short, I agree with @bskinn's reasoning and suggestions, and will probably not merge this PR as it is. Instead we should log warnings on invalid lines. |
In: https://github.com/sphinx-doc/sphinx/blob/master/sphinx/util/inventory.py#L113-L121 ...non-matching lines are indeed skipped. |
That sounds very reasonable to me! The offending lines are mostly automatically generated from figure/image captions and are very unlikely to be desired in any cross reference. I'll revise this PR to simply discard lines that can't be parsed rather than append them to the previous entry, but I won't be able to get to this immediately. I'd be against issuing a warning - the warning would usually not be actionable by the users as by definition it comes from another documentation site. In fact, in this case even the docs that the inventory was from wouldn't be able to action the warning as its more or less a misjudgement in a Sphinx plugin. I'd worry it'd almost always just be noise, which mkdocs is usually good about minimising. I don't believe Sphinx warns. |
Ah, you're completely right @Yoshanuikabundi, good point. No warning log then! And no worries, I'll take it from here if you don't have the time 🙂 |
Great, thanks @pawamoy! |
Ruff 0.10.0 introduces the S704 lint, which triggers when a non- literal string is passed to `markupsafe.Markup()`. This triggered 5 times in the codebase. Only one of these errors was trivially fixable, and the fix caused tests to fail because the "fix" introduced escapes to already correct markup. This commit therefore configures Ruff to ignore this lint and does not fix any code that triggers it. Other changes are due to other formatting and linting changes from recent releases of Ruff. Ruff 0.10.0: https://github.com/astral-sh/ruff/releases/tag/0.10.0 Lint S704: https://docs.astral.sh/ruff/rules/unsafe-markup-use/
Previously, inventory items whose `dispname` value contains multiple lines would prevent mkdocstrings from loading the whole inventory file. This change makes mkdocstrings ignore invalid lines in inventories so that the rest of the inventory can still be loaded. This continuation line behavior can be seen in the wild in the OpenEye toolkits inventory file and a few Open Force Field inventory files. These projects' inventories cannot be used with mkdocstrings because of the raised error. Note that in Sphinx too, these inventory files are read succesfully, and the continuation lines are discarded, truncating the display name. In theory, a continuation line that by chance did parse correctly would be interpreted by both packages as a new inventory item. OpenEye Toolkits inventory file: https://docs.eyesopen.com/toolkits/python/objects.inv Open Force Field Toolkit inventory file: https://docs.openforcefield.org/projects/toolkit/en/stable/objects.inv BespokeFit inventory file: https://docs.openforcefield.org/projects/bespokefit/en/stable/objects.inv
Hi! I'm Josh, I'm in charge of documentation at Open Force Field. We produce a particular kind of mathematical model for computational chemistry. I haven't been using mkdocs/mkdocstrings for long, but I'm enjoying getting to know it, and I'm very excited to write docstrings in markdown!
This is my first contribution here. Thanks for making such clear contributor docs! If you'd rather I do anything another way I'm very happy to fix things. Sorry I haven't raised an issue, but by the time I'd figured out what was going on I basically had the fix written, so it's all here.
The issue
When a display name in a Sphinx inventory file spans multiple lines, both mkdocstrings and Sphinx simply print the newline character and then the continuation. This is because a display name spanning multiple lines is a rare edge case that was not accounted for in either software. However, while Sphinx discards any inventory lines that do not parse (such as the continuation line), mkdocstrings raises an error, causing the entire inventory file to fail to load.
Additional context
This continuation line behavior can be seen in the wild in the OpenEye Toolkits inventory file and a few Open Force Field inventory files. These projects' inventories cannot be used with mkdocstrings because of the raised error. You can see this for yourself by uncommenting the relevant lines in OpenFF Pablo's
mkdocs.yml
, but it might be a pain to collect the dependencies unless you're familiar with Conda (environment file is indevtools/conda-envs/docs_env.yaml
).Note that in Sphinx, these inventory files are read successfully, but the continuation lines are discarded, truncating the display name. In theory, a continuation line that by chance did parse correctly would be interpreted by both packages as a new inventory item.
Changes in this PR
This PR appends the continuation line to the previous inventory item display name when it fails to parse. This allows these inventory files to be loaded while preserving the original display name. Note that the theoretical continuation line that parses by chance is not addressed. Round-trip tests and additional Sphinx tests are added to check the new behavior. I've run these tests locally with Sphinx added to the dependencies and they do actually succeed, and I've also confirmed that they fail without the new code to confirm the tests are doing something.
I also found that
make format
andmake check
introduced several formatting changes and new lint failures outside the files I changed; I determined that they were due to the recent new version of Ruff and corrected them in a separate commit within this PR. One judgement I made there that deserves particular review is that a new Ruff lint was triggered, apparently over-eagerly; applying the suggested fix where it made sense caused the tests to fail. I therefore set the lint to be ignored. There are some more details in the commit message. Happy to tweak this (and everything else), but it seemed like the existing code was usingmarkupsafe.Markup
as intended to declare that a certain bit of dynamically generated markup was correct, and the lint was applying too strict a standard.Other stuff
Thanks for all your work on mkdocstrings!