Skip to content

Conversation

Eric-Arellano
Copy link
Collaborator

Closes #461.

Before, the script took me about 42 seconds. Now, about 5.7:

❯ hyperfine 'npm run check:links'
Benchmark 1: npm run check:links
  Time (mean ± σ):      5.680 s ±  0.143 s    [User: 7.285 s, System: 0.471 s]
  Range (min … max):    5.548 s …  5.964 s    10 runs
 

This PR differentiates between files we want to check vs. files that we only need to know they exist because other files might link to them. For the latter, we do still need to read in those files to determine their anchors.

The key insight of this PR is that 99% of the historical API docs can be ignored because the stable docs don't link to them. We only need to load a couple of historical API docs because our migration guides link to them.

--

I think we probably will want to add an argument to allow checking links for historical API docs: #495. This should work well with that change; we'd update the list based off of the Arguments passed in.

@@ -116,7 +142,6 @@ const readArgs = (): Arguments => {
.version(false)
.option("external", {
type: "boolean",
demandOption: false,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the default. No need to set it.

@@ -134,57 +159,60 @@ function markdownFromNotebook(source: string): string {
return markdown;
}

async function getMarkdownAndAnchors(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same logic as before.

Copy link
Collaborator

@arnaucasau arnaucasau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Eric! On my end, the execution time goes from 1 minute 40 seconds to roughly 15 seconds! Awesome work!!

Indeed, the problem with the script was the amount of historical files we were parsing its anchors. Having an extra glob to decide which ones we want to load is a really good idea, and now we can avoid having extra logic to ignore files in addition to the speed up.

Eric-Arellano and others added 2 commits December 8, 2023 10:17
Co-authored-by: Arnau Casau <47946624+arnaucasau@users.noreply.github.com>
@Eric-Arellano Eric-Arellano added this pull request to the merge queue Dec 8, 2023
Merged via the queue into main with commit d1279aa Dec 8, 2023
@Eric-Arellano Eric-Arellano deleted the EA/speed-up-link-checker branch December 8, 2023 16:43
github-merge-queue bot pushed a commit that referenced this pull request Dec 29, 2023
In #495, we will start
checking historical API docs. For performance, it's important that we
don't load every single file in the project at once in memory. Instead,
we want to operate in batches: e.g. check all of 0.44, then drop from
memory and move on to 0.43.

This PR is a pre-factor to add a new `FileBatch` class. For now, we only
have a single `FileBatch`, the same as before. This class still
differentiates between files "to load" vs "to check", per
#496.
frankharkins pushed a commit to frankharkins/documentation that referenced this pull request Jul 22, 2024
Closes Qiskit#461.

Before, the script took me about 42 seconds. Now, about 5.7:

```
❯ hyperfine 'npm run check:links'
Benchmark 1: npm run check:links
  Time (mean ± σ):      5.680 s ±  0.143 s    [User: 7.285 s, System: 0.471 s]
  Range (min … max):    5.548 s …  5.964 s    10 runs
 
```

This PR differentiates between files we want to check vs. files that we
only need to know they exist because other files might link to them. For
the latter, we do still need to read in those files to determine their
anchors.

The key insight of this PR is that 99% of the historical API docs can be
ignored because the stable docs don't link to them. We only need to load
a couple of historical API docs because our migration guides link to
them.

--

I think we probably will want to add an argument to allow checking links
for historical API docs:
Qiskit#495. This should work
well with that change; we'd update the list based off of the `Arguments`
passed in.

---------

Co-authored-by: Arnau Casau <47946624+arnaucasau@users.noreply.github.com>
frankharkins pushed a commit to frankharkins/documentation that referenced this pull request Jul 22, 2024
In Qiskit#495, we will start
checking historical API docs. For performance, it's important that we
don't load every single file in the project at once in memory. Instead,
we want to operate in batches: e.g. check all of 0.44, then drop from
memory and move on to 0.43.

This PR is a pre-factor to add a new `FileBatch` class. For now, we only
have a single `FileBatch`, the same as before. This class still
differentiates between files "to load" vs "to check", per
Qiskit#496.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Profile link checker
2 participants