-
-
Notifications
You must be signed in to change notification settings - Fork 4.8k
perf: improve time complexity of getLocFromIndex
#19782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: improve time complexity of getLocFromIndex
#19782
Conversation
✅ Deploy Preview for docs-eslint canceled.
|
✅ Deploy Preview for docs-eslint canceled.
|
getLocFromIndex
and getIndexFromLoc
getLocFromIndex
Is it possible for you to share some before vs after data? |
BenchmarkI've run a benchmark and would like to share the results. ResultSummary: The new version implementation is 8–9 times faster than the original one. $ node benchmark.js
Running benchmark...(1/5)
getLocFromIndexBefore: 5.067s
getLocFromIndexAfterWithoutCache: 622.719ms
getLocFromIndexAfter: 769.587ms
Running benchmark...(2/5)
getLocFromIndexBefore: 5.028s
getLocFromIndexAfterWithoutCache: 619.502ms
getLocFromIndexAfter: 720.991ms
Running benchmark...(3/5)
getLocFromIndexBefore: 5.064s
getLocFromIndexAfterWithoutCache: 620.599ms
getLocFromIndexAfter: 761.039ms
Running benchmark...(4/5)
getLocFromIndexBefore: 5.143s
getLocFromIndexAfterWithoutCache: 628.235ms
getLocFromIndexAfter: 750.742ms
Running benchmark...(5/5)
getLocFromIndexBefore: 5.099s
getLocFromIndexAfterWithoutCache: 634.295ms
getLocFromIndexAfter: 729.528ms Setup
|
@lumirlumir can you try creating a benchmark using this package? https://www.npmjs.com/package/benchmark It's optimized to ensure we're not being fooled by the JIT compiler. |
Benchmark 2@nzakas I've used ResultSummary: The new version implementation is 7–8 times faster than the original one. $ node benchmark.js
getLocFromIndexBefore x 1.87 ops/sec ±4.61% (9 runs sampled)
getLocFromIndexAfterWithoutCache x 14.97 ops/sec ±2.06% (42 runs sampled)
getLocFromIndexAfter x 12.94 ops/sec ±1.61% (36 runs sampled)
Fastest is getLocFromIndexAfterWithoutCache
Performance improvements:
Binary search optimization: 87.49% faster than original
Caching optimization: -15.63% faster than without cache
Total improvement: 85.54% faster than original Setup
|
It seems like using the version without the cache is the way to go. Nice work! |
I've added a new commit without cache in db0f55c. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice optimization. Thanks!
Leaving it open for others to review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! Leaving it open for @nzakas to verify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just waiting for patch release window to close before merging.
Prerequisites checklist
What is the purpose of this pull request? (put an "X" next to an item)
[ ] Documentation update
[ ] Bug fix (template)
[ ] New rule (template)
[ ] Changes an existing rule (template)
[ ] Add autofix to a rule
[ ] Add a CLI option
[x] Add something to the core
[ ] Other, please explain:
What changes did you make? (Give an overview)
Hello,
In this PR, I've improved the time complexity of
getLocFromIndex
.Thanks to the hints suggested by @nzakas, I found several opportunities to improve this method.
Current Problem
The$O(n)$ .
getLocFromIndex
function currently has a time complexity ofThis is because
this.lineStartIndices.findIndex(el => index < el)
performs a linear search, which takes up ton
iterations in the worst and average cases.So, if a file has 10,000 lines, this function may run up to 10,000 iterations to return a result.
Room for Improvement — 1
While reviewing how
lineStartIndices
is constructed, I discovered something useful:Since the
SourceCode
class processes lines sequentially, thelineStartIndices
array is always sorted in ascending order.A sorted array? That means we can use binary search, reducing the complexity to$O(\log n)$ !
So, I replaced
findIndex
with binary search — and it worked!Now,$O(\log n)$ time instead of $O(n)$ .
getLocFromIndex
performs inWith 10,000 lines, the function only performs about 14 operations instead of 10,000.
Room for Improvement — 2
But should we stop at$O(\log n)$ ?
Thanks again to the earlier suggestions, I was able to add caching — which means we can now achieve$O(1)$ performance on cache hits.
I reused the existing caching mechanism and introduced an
indexToLoc
cache.Result
getLocFromIndex
always ran inn
is the number of lines in the file.Is there anything you'd like reviewers to focus on?
Since$O(1)$ time, I didn’t change its current behavior.
getIndexFromLoc
already works inRefs: eslint/rewrite#166, eslint/rewrite#212, eslint/markdown#376