Skip to content

Conversation

joyceerhl
Copy link
Contributor

@joyceerhl joyceerhl commented Mar 18, 2022

Motivation for this change

I use the micromatch library to support processing include/exclude glob patterns for file search in the GitHub Repositories extension for VS Code, which integrates with github.dev to allow users to browse repos without cloning them locally. I recently received a report that file search in a github.dev repo was not working for a user, tracked as microsoft/vscode-remote-repositories-github#156. It turned out that it was working, but was just taking a very long time.

By adding some timestamped log statements to the micromatch library code, I isolated the bottleneck to this block of code in the micromatch.not function:

micromatch/index.js

Lines 160 to 164 in 34f44b4

for (let item of items) {
if (!matches.includes(item)) {
result.add(item);
}
}

In my case, items contained 5mil elements, and matches contained 42k elements. This resulted in the library spending almost 9 minutes in micromatch.not. 99.2% of that time was spent in the block of code above.

Here's an example of the logging I added in micromatch.not:

const postProcess = Date.now();
console.log(`[ MICROMATCH.NOT] Got ${matches.length} matches and ${items.length} items, postprocessing...`);

for (let item of items) {
  if (!matches.includes(item)) {
    result.add(item);
  }
}
console.log(` [ MICROMATCH.NOT] Done postprocessing in ${Date.now() - postProcess} ms, got ${result.size} results`);

And here are the numbers when run on the inputs mentioned above:

[ MICROMATCH] Done processing 42478 matches
[ MICROMATCH.NOT] Got 42478 matches and 5453280 items, postprocessing...
 [ MICROMATCH.NOT] Done postprocessing in 522035 ms, got 54902 results
[ GLOBFILTER] Done processing micromatch in 525728ms

Impact of this change

With the change in this PR, this cuts the postprocessing time on my machine from 525728ms to 924ms, yielding an approximately 500x speedup in the postprocessing step. Overall, the time spent in the micromatch.not function goes from 525728ms to 4687ms, which is approximately a 100x improvement for consumers of the micromatch.not function:

  let matches = new Set(micromatch(list, patterns, { ...options, onResult }));

  for (let item of items) {
    if (!matches.has(item)) {
      result.add(item);
    }
  }
[ MICROMATCH] Done processing 42478 matches
[ MICROMATCH.NOT] Got 42478 matches and 5453280 items, postprocessing...
 [ MICROMATCH.NOT] Done postprocessing in 924 ms, got 54902 results
[ GLOBFILTER] Done processing micromatch.not in 4687ms

@jonschlinkert
Copy link
Member

Wow, this is fantastic! This is how you do a PR. I just glanced at the code and I'm 100% on board with this change. Thank you so much for the time you spent on this, for isolating the problem, creating a solution, and taking the time to provide a detailed explanation.

I'm about to hop in the car, and won't be back online to merge this until maybe later this weekend. But I definitely approve this and would be happy with someone else merging and publishing a release, if you want to do it before I get to it.

@joyceerhl thank you!

@joyceerhl
Copy link
Contributor Author

@jonschlinkert No problem--happy to contribute back to this library! 😄

@joyceerhl
Copy link
Contributor Author

Hi @jonschlinkert, would you have time to review this PR this week, please? I'd love to include this fix in the VS Code release that's happening next week. Thanks a ton!

@jonschlinkert
Copy link
Member

@joyceerhl yes, absolutely! I'll do this today. thanks for the follow up!

@jonschlinkert jonschlinkert merged commit 3899055 into micromatch:master Mar 24, 2022
jonschlinkert added a commit that referenced this pull request Mar 24, 2022
@jonschlinkert
Copy link
Member

done! published and patch released as 4.0.5! thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants