-
Notifications
You must be signed in to change notification settings - Fork 133
Speed up link checker by not loading historical API docs #496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -116,7 +142,6 @@ const readArgs = (): Arguments => { | |||
.version(false) | |||
.option("external", { | |||
type: "boolean", | |||
demandOption: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the default. No need to set it.
@@ -134,57 +159,60 @@ function markdownFromNotebook(source: string): string { | |||
return markdown; | |||
} | |||
|
|||
async function getMarkdownAndAnchors( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same logic as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Eric! On my end, the execution time goes from 1 minute 40 seconds to roughly 15 seconds! Awesome work!!
Indeed, the problem with the script was the amount of historical files we were parsing its anchors. Having an extra glob to decide which ones we want to load is a really good idea, and now we can avoid having extra logic to ignore files in addition to the speed up.
Co-authored-by: Arnau Casau <47946624+arnaucasau@users.noreply.github.com>
In #495, we will start checking historical API docs. For performance, it's important that we don't load every single file in the project at once in memory. Instead, we want to operate in batches: e.g. check all of 0.44, then drop from memory and move on to 0.43. This PR is a pre-factor to add a new `FileBatch` class. For now, we only have a single `FileBatch`, the same as before. This class still differentiates between files "to load" vs "to check", per #496.
Closes Qiskit#461. Before, the script took me about 42 seconds. Now, about 5.7: ``` ❯ hyperfine 'npm run check:links' Benchmark 1: npm run check:links Time (mean ± σ): 5.680 s ± 0.143 s [User: 7.285 s, System: 0.471 s] Range (min … max): 5.548 s … 5.964 s 10 runs ``` This PR differentiates between files we want to check vs. files that we only need to know they exist because other files might link to them. For the latter, we do still need to read in those files to determine their anchors. The key insight of this PR is that 99% of the historical API docs can be ignored because the stable docs don't link to them. We only need to load a couple of historical API docs because our migration guides link to them. -- I think we probably will want to add an argument to allow checking links for historical API docs: Qiskit#495. This should work well with that change; we'd update the list based off of the `Arguments` passed in. --------- Co-authored-by: Arnau Casau <47946624+arnaucasau@users.noreply.github.com>
In Qiskit#495, we will start checking historical API docs. For performance, it's important that we don't load every single file in the project at once in memory. Instead, we want to operate in batches: e.g. check all of 0.44, then drop from memory and move on to 0.43. This PR is a pre-factor to add a new `FileBatch` class. For now, we only have a single `FileBatch`, the same as before. This class still differentiates between files "to load" vs "to check", per Qiskit#496.
Closes #461.
Before, the script took me about 42 seconds. Now, about 5.7:
This PR differentiates between files we want to check vs. files that we only need to know they exist because other files might link to them. For the latter, we do still need to read in those files to determine their anchors.
The key insight of this PR is that 99% of the historical API docs can be ignored because the stable docs don't link to them. We only need to load a couple of historical API docs because our migration guides link to them.
--
I think we probably will want to add an argument to allow checking links for historical API docs: #495. This should work well with that change; we'd update the list based off of the
Arguments
passed in.