~100x perf improvement of micromatch.not
on large inputs by using Set.prototype.has
over Array.prototype.includes
#233
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation for this change
I use the
micromatch
library to support processing include/exclude glob patterns for file search in the GitHub Repositories extension for VS Code, which integrates withgithub.dev
to allow users to browse repos without cloning them locally. I recently received a report that file search in a github.dev repo was not working for a user, tracked as microsoft/vscode-remote-repositories-github#156. It turned out that it was working, but was just taking a very long time.By adding some timestamped log statements to the
micromatch
library code, I isolated the bottleneck to this block of code in themicromatch.not
function:micromatch/index.js
Lines 160 to 164 in 34f44b4
In my case,
items
contained 5mil elements, andmatches
contained 42k elements. This resulted in the library spending almost 9 minutes inmicromatch.not
. 99.2% of that time was spent in the block of code above.Here's an example of the logging I added in
micromatch.not
:And here are the numbers when run on the inputs mentioned above:
Impact of this change
With the change in this PR, this cuts the postprocessing time on my machine from 525728ms to 924ms, yielding an approximately 500x speedup in the postprocessing step. Overall, the time spent in the
micromatch.not
function goes from 525728ms to 4687ms, which is approximately a 100x improvement for consumers of themicromatch.not
function: