Skip to content

Conversation

Yoshanuikabundi
Copy link
Contributor

@Yoshanuikabundi Yoshanuikabundi commented Mar 27, 2025

Hi! I'm Josh, I'm in charge of documentation at Open Force Field. We produce a particular kind of mathematical model for computational chemistry. I haven't been using mkdocs/mkdocstrings for long, but I'm enjoying getting to know it, and I'm very excited to write docstrings in markdown!

This is my first contribution here. Thanks for making such clear contributor docs! If you'd rather I do anything another way I'm very happy to fix things. Sorry I haven't raised an issue, but by the time I'd figured out what was going on I basically had the fix written, so it's all here.

The issue

When a display name in a Sphinx inventory file spans multiple lines, both mkdocstrings and Sphinx simply print the newline character and then the continuation. This is because a display name spanning multiple lines is a rare edge case that was not accounted for in either software. However, while Sphinx discards any inventory lines that do not parse (such as the continuation line), mkdocstrings raises an error, causing the entire inventory file to fail to load.

Additional context

This continuation line behavior can be seen in the wild in the OpenEye Toolkits inventory file and a few Open Force Field inventory files. These projects' inventories cannot be used with mkdocstrings because of the raised error. You can see this for yourself by uncommenting the relevant lines in OpenFF Pablo's mkdocs.yml, but it might be a pain to collect the dependencies unless you're familiar with Conda (environment file is in devtools/conda-envs/docs_env.yaml).

Note that in Sphinx, these inventory files are read successfully, but the continuation lines are discarded, truncating the display name. In theory, a continuation line that by chance did parse correctly would be interpreted by both packages as a new inventory item.

Changes in this PR

This PR appends the continuation line to the previous inventory item display name when it fails to parse. This allows these inventory files to be loaded while preserving the original display name. Note that the theoretical continuation line that parses by chance is not addressed. Round-trip tests and additional Sphinx tests are added to check the new behavior. I've run these tests locally with Sphinx added to the dependencies and they do actually succeed, and I've also confirmed that they fail without the new code to confirm the tests are doing something.

I also found that make format and make check introduced several formatting changes and new lint failures outside the files I changed; I determined that they were due to the recent new version of Ruff and corrected them in a separate commit within this PR. One judgement I made there that deserves particular review is that a new Ruff lint was triggered, apparently over-eagerly; applying the suggested fix where it made sense caused the tests to fail. I therefore set the lint to be ignored. There are some more details in the commit message. Happy to tweak this (and everything else), but it seemed like the existing code was using markupsafe.Markup as intended to declare that a certain bit of dynamically generated markup was correct, and the lint was applying too strict a standard.

Other stuff

Thanks for all your work on mkdocstrings!

Comment on lines 172 to 174
if len(line) == 0 and len(items) == 0:
# Skip empty lines at start of inventory section
continue
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit just allows inventory files without any items to be loaded, which is used in the round trip test; it can be removed without any dramas, but the empty inventory will also need to be removed from the round trip test (but not the Sphinx loading test).

@pawamoy
Copy link
Member

pawamoy commented Mar 28, 2025

Hi @Yoshanuikabundi, thanks for the PR and detailed explanation!

Looking at Sphobjinv's docs (from @bskinn):

{dispname}
Default cross-reference text to be displayed in compiled documentation.
Constraints:

  • MUST have nonzero length
  • MAY contain internal whitespace (leading/trailing whitespace is ignored)

...it seems newlines are indeed valid inside dispname. @bskinn could you confirm? 🙏

Out of curiosity @Yoshanuikabundi, could you show some example dispname values that are present in the mentioned inventories?

@bskinn
Copy link

bskinn commented Mar 28, 2025

...it seems newlines are indeed valid inside dispname.

My first reaction: I believe this is >incorrect<, and in fact newlines are invalid anywhere in any field in a v2 objects.inv file.

I would have to dig back into the Sphinx code to be sure, but I'm pretty sure that Sphinx processes the decoded objects.inv payload line by line, always treating each line separately as a candidate inventory item. Any line that doesn't parse as a valid inventory object is silently ignored, leading to the behavior noted by OP:

However, while Sphinx discards any inventory lines that do not parse (such as the continuation line) ...

I wrote all of of the higher-level components of the sphobjinv API to mimic this behavior, for maximum compatibility. I don't have a ticket for it at the moment, but a good feature for sphobjinv (and for Sphinx too, for that matter) would be to emit a warning if any lines in an inventory were ignored due to syntax mismatch. (No ticket at the moment on sphobjinv; I opened sphinx-doc/sphinx#8825 some time ago, but neither I nor anyone else has worked on it, AFAIK.)

My opinion here is that mkdocstrings revise to match Sphinx's behavior, and not add handling for newlines in dispname.

This seems to me to be an XY problem. IMO, dispnames should be short, at most a small phrase; and, thus any text substantial enough to need a newline is not appropriate for a dispname, whether or not the dispname machinery happens to work for the desired use-case. In the specific examples I found below, the majority appear to be figure captions -- perhaps the intention for these is to create a Table of Figures? If so, I would think these newlines within the dispnames would be problematic if included in that ToF.

In any event, I would guess that the best solution to this documentation need is a Sphinx extension of some kind.


I found the mismatching objects in the OpenEye Toolkits objects.inv with the following:

>>> import sphobjinv as soi
>>> from pathlib import Path
>>> cbdata = Path('objects.inv').read_bytes()
>>> bdata = soi.decompress(cbdata)
>>> tdata = bdata.decode()
>>> [l for l in tdata.splitlines()[4:] if not soi.re.p_data.match(l)]  # [4:] skips the header

... list of syntax-mismatching lines ...

I then exported the decompressed contents to disk with:

>>> Path('objects-manual.txt').write_text(tdata)

And searched for the mismatching lines. Here are some examples of full entries with internal newlines from that same objects.inv:

figure_fragmentation std:label -1 medchemtk/fragmentation.html#figure-fragmentation Example of fragmentation (A) input molecule
(B) fragments returned by the OEGetRingChainFragments function

figure_oe2datomdisplay_getpropertyoffset std:label -1 depicttk/OEDepictClasses/OE2DAtomDisplay.html#figure-oe2datomdisplay-getpropertyoffset The property offset vector is relative to coordinates returned by
the OE2DAtomDisplay::GetCoords method

figure_oe2dbonddisplay_getpropertyoffset std:label -1 depicttk/OEDepictClasses/OE2DBondDisplay.html#figure-oe2dbonddisplay-getpropertyoffset The property offset vector is relative to the middle of the
displayed bond

figure_ring_sssr std:label -1 oechemtk/ring.html#figure-ring-sssr Example of SSSR
SSSR is not an invariant subset of all possible rings; (a), (b) and (c) depict the three equally valid SSSR of the bridged structure (on the left)

posit_cross_docking_results std:label -1 dockingtk/theory/shapefit.html#posit-cross-docking-results SHAPEFIT Cross Docking Results: Probability of finding a good pose based on bound-ligand
fit-ligand TanimotoCombo similarity.  Standard docking results
are essentially the same and follow the same trajectories
flattening out as they hit their limit of accuracy.  While SHAPEFIT
performs worse at low similarities it continually increases as
similarity increases.

posit_heat_map std:label -1 dockingtk/theory/pose_stub_tk.html#posit-heat-map POSIT Probability Map: Given a 2D similarity (in this case the MACCS 166 descriptor set [Durant-2002])
and a 3D similarity (TanimotoCombo) posit computes a
probability of finding the correct pose based on an analysis of
historical and experimental data.

@bskinn
Copy link

bskinn commented Mar 28, 2025

One thing this discussion points to for sphobjinv, is a need for additional specificity on that docs page.

The invalidity of newlines in any field is implicit in the use of "data line" in this sentence on that page (emphasis added):

The descriptions below have been updated to reflect this and to provide more detailed information on the constraints governing each field of an objects.inv data line.

But the description could use a more direct statement of the strict limitation of each data item being constrained to a single line of the decompressed inventory source.

@pawamoy
Copy link
Member

pawamoy commented Mar 28, 2025

Thank you @bskinn!

My initial reaction was similar: even if newlines were valid, how they are rendered depend on the output format (HTML, Markdown, PDF, etc.) and so I wasn't sure it made sense to support them.

Thanks a lot for searching for and finding examples in the mentioned inventories. These "display names" definitely look like descriptions, not names, so I'd be inclined to say they are invalid.

In short, I agree with @bskinn's reasoning and suggestions, and will probably not merge this PR as it is. Instead we should log warnings on invalid lines.

@pawamoy
Copy link
Member

pawamoy commented Mar 28, 2025

In:

https://github.com/sphinx-doc/sphinx/blob/master/sphinx/util/inventory.py#L113-L121

...non-matching lines are indeed skipped.

@Yoshanuikabundi
Copy link
Contributor Author

That sounds very reasonable to me! The offending lines are mostly automatically generated from figure/image captions and are very unlikely to be desired in any cross reference. I'll revise this PR to simply discard lines that can't be parsed rather than append them to the previous entry, but I won't be able to get to this immediately.

I'd be against issuing a warning - the warning would usually not be actionable by the users as by definition it comes from another documentation site. In fact, in this case even the docs that the inventory was from wouldn't be able to action the warning as its more or less a misjudgement in a Sphinx plugin. I'd worry it'd almost always just be noise, which mkdocs is usually good about minimising. I don't believe Sphinx warns.

@pawamoy
Copy link
Member

pawamoy commented Mar 29, 2025

Ah, you're completely right @Yoshanuikabundi, good point. No warning log then! And no worries, I'll take it from here if you don't have the time 🙂

@Yoshanuikabundi
Copy link
Contributor Author

Great, thanks @pawamoy!

Ruff 0.10.0 introduces the S704 lint, which triggers when a non-
literal string is passed to `markupsafe.Markup()`. This triggered
5 times in the codebase. Only one of these errors was trivially
fixable, and the fix caused tests to fail because the "fix"
introduced escapes to already correct markup. This commit therefore
configures Ruff to ignore this lint and does not fix any code that
triggers it.

Other changes are due to other formatting and linting changes from
recent releases of Ruff.

Ruff 0.10.0: https://github.com/astral-sh/ruff/releases/tag/0.10.0
Lint S704: https://docs.astral.sh/ruff/rules/unsafe-markup-use/
Previously, inventory items whose `dispname` value contains multiple lines would prevent mkdocstrings from loading the whole inventory file. This change makes mkdocstrings ignore invalid lines in inventories so that the rest of the inventory can still be loaded.

This continuation line behavior can be seen in the wild in the OpenEye toolkits inventory file and a few Open Force Field inventory files. These projects' inventories cannot be used with mkdocstrings because of the raised error.

Note that in Sphinx too, these inventory files are read succesfully, and the continuation lines are discarded, truncating the display name. In theory, a continuation line that by chance did parse correctly would be interpreted by both packages as a new inventory item.

OpenEye Toolkits inventory file: https://docs.eyesopen.com/toolkits/python/objects.inv
Open Force Field Toolkit inventory file:  https://docs.openforcefield.org/projects/toolkit/en/stable/objects.inv
BespokeFit inventory file: https://docs.openforcefield.org/projects/bespokefit/en/stable/objects.inv
@pawamoy pawamoy merged commit 81caff5 into mkdocstrings:main Mar 31, 2025
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants