-
Notifications
You must be signed in to change notification settings - Fork 133
Refactor link checker to use FileBatch class #557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I didn't add tests for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
async load(): Promise<[File[], Link[], Link[]]> { | ||
const files: File[] = []; | ||
for (let filePath of this.toLoad) { | ||
const [_, anchors] = await getMarkdownAndAnchors(filePath); | ||
files.push(new File(filePath, anchors)); | ||
} | ||
|
||
const linksToOriginFiles = new Map<string, string[]>(); | ||
for (const filePath of this.toCheck) { | ||
const [markdown, anchors] = await getMarkdownAndAnchors(filePath); | ||
files.push(new File(filePath, anchors)); | ||
await addLinksToMap(filePath, markdown, linksToOriginFiles); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't specific to this PR so is non-blocking, but I'm a bit confused about "to load" vs "to check"; it seems we're "loading" both (through getMarkdownAndAnchors
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is toCheck
a subset of toLoad
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key difference is await addLinksToMap(filePath, markdown, linksToOriginFiles)
, which we only do with toCheck
. For toLoad
, we record all of its links because they might be linked to from other files. For toCheck
, we also record its own links to make sure that those links are valid.
I'll add comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The refactor looks really good! I tested it locally and all works well. Thank you Eric!
const otherFiles = [ | ||
...(await globby("{public,docs}/**/*.{png,jpg,gif,svg}")).map( | ||
(fp) => new File(fp, [], false), | ||
(fp) => new File(fp, []), | ||
), | ||
...SYNTHETIC_FILES.map((fp) => new File(fp, [], true)), | ||
]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as an idea. What do you think about having the otherFiles
in the constructor of the fileBatch
class and using the fromGlobs
for them? By doing that, we can avoid having an extra argument in the check call. I think that the argument can be confusing because we already provided files to the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that could work too. The reason I didn't do it is we'd end up recomputing otherFiles
many times in the follow up PR to add historical APIs, since each historical version is its own FileBatch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see, that's a good reason. However, this list loads all the images for all the versions every time, when perhaps it's not necessary to load any image besides the ones belonging to that version, and maybe the current one. For example, Qiskit v0.33 doesn't need to have the images in public/images/api/qiskit/0.19/
load. Regardless of that, I like a lot you did the refactor, so if you think it's okay because the amount of images is not that large, I'm good with it too 😃
@@ -25,7 +25,7 @@ export class File { | |||
* path: Path to the file | |||
* anchors: Anchors available in the file | |||
*/ | |||
constructor(path: string, anchors: string[], synthetic: boolean) { | |||
constructor(path: string, anchors: string[], synthetic: boolean = false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea with the default value!
In Qiskit#495, we will start checking historical API docs. For performance, it's important that we don't load every single file in the project at once in memory. Instead, we want to operate in batches: e.g. check all of 0.44, then drop from memory and move on to 0.43. This PR is a pre-factor to add a new `FileBatch` class. For now, we only have a single `FileBatch`, the same as before. This class still differentiates between files "to load" vs "to check", per Qiskit#496.
In #495, we will start checking historical API docs. For performance, it's important that we don't load every single file in the project at once in memory. Instead, we want to operate in batches: e.g. check all of 0.44, then drop from memory and move on to 0.43.
This PR is a pre-factor to add a new
FileBatch
class. For now, we only have a singleFileBatch
, the same as before. This class still differentiates between files "to load" vs "to check", per #496.