Skip to content

API docs generation should download the original Sphinx HTML from Box #719

@Eric-Arellano

Description

@Eric-Arellano

Original issue, which was trying to use Git. See below comment for new plan with Box

Motivation

After a few months of API generation, we've found a few issues with the current workflow:

  1. We regenerate docs more than expected, and it's annoying when regenerating docs to trace down what the prior CI artifact URL was. If you get it wrong, then you'll have unintentional changes from the prior generation.
  2. The historical API docs don't have any CI artifacts, so the only way to get the original Sphinx is to ask @arnaucasau or me. This is bad for bus factor / sustainability.
  3. CI artifacts expire after 90 days, so we won't be able to easily get the artifacts in the future.
  4. We need to make changes to the historical Qiskit API docs to fix issues like 500 error in historical API docs file due to Katex #673 or broken links.
    • This is because we can't easily regenerate the original Sphinx docs due to "bit rot".
    • For new API docs, we should fix the source repository directly.
    • We could do this either by a) changing the original HTML or b) adding special-case rules in our pipeline to do string replacing. Several of us agreed that a) is more sustainable because the string replacing risks being overly general and making unintentional changes. But, if we change the original HTML, we need to make sure every other developer gets those edits.

Another framing of this: API docs generation should be deterministic. If you run the docs generation today vs in 6 months, you should get the same result, regardless of what machine you're running on. Whereas right now people can have different copies of the original HTML docs locally, or not know how to get them in the first place.

Why upload to Git?

We considered alternatives like uploading to file storage like Box. It's simpler to upload it to Git and it's useful so that we have proper versioning.

Suggested implementation

Store artifacts by minor version, not patch

I recommend storing at paths like scripts/api-artifacts/qiskit/0.19.zip. (Feel free to suggest other paths!)

I think it's more useful to store as 0.19 rather than the patch version 0.19.1 because it reflects reality: we only ever have one patch version per minor version. When upgrading to a new patch version, there's no need to store the original HTML docs.

Store compressed artifact

I heavily lean towards us storing the HTML docs as compressed like a zip file or tar file.

Compression makes the file size smaller and fewer number of files. That's useful so the repo doesn't get even slower to use with Git. We could also look into Git LFS (large file storage) to make it faster.

Compression also results in a less noisy Git diff when adding a new version of the docs.

The main downside is that the diff is less useful when we make manual changes to the input HTML. But I don't expect us to do this very often; it should only happen for historical Qiskit docs where we can't change the source code anymore. We can still always inspect the compressed file before and after, and the PR description should describe what files were changed.

updateApiDocs.ts -a argument

The argument should become optional. If not given, that means to use the existing HTML; error if there is no existing HTML.

If -a is given, that means to replace the original HTML with the new one.

Other updateApiDocs.ts arguments stay the same

That is, --package, --version, and --historical. We don't try to embed this info into how we store the input HTML.

Always delete the uncompressed HTML

We should always wipe the uncompressed HTML from the .out folder and re-uncompress based on whatever the input HTML from Git is. That ensures that the build is deterministic and we always use the correct version of the docs.

Note that we can rename the .out folder and hierarchy to whatever is useful.

Readme changes

We should tell you how to make manual changes to the input HTML and then re-compress the file. Manual changes should only be for historical API docs. When possible, it's better to fix the original source repository rather than changing the file in our repo.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions