Skip to content

Conversation

lumirlumir
Copy link
Member

@lumirlumir lumirlumir commented May 28, 2025

Prerequisites checklist

What is the purpose of this pull request? (put an "X" next to an item)

[ ] Documentation update
[ ] Bug fix (template)
[ ] New rule (template)
[ ] Changes an existing rule (template)
[ ] Add autofix to a rule
[ ] Add a CLI option
[x] Add something to the core
[ ] Other, please explain:

What changes did you make? (Give an overview)

Hello,

In this PR, I've improved the time complexity of getLocFromIndex.

  1. We should be caching every index's location after it's calculated.
  2. We should keep track of every index that begins a new line when we find it
  3. We may be able to use the loc property on nodes to cheat when calculating some locations
  4. (Plus one I noticed) — we might be able to use binary search for faster lookups.

Thanks to the hints suggested by @nzakas, I found several opportunities to improve this method.


Current Problem

The getLocFromIndex function currently has a time complexity of $O(n)$.

This is because this.lineStartIndices.findIndex(el => index < el) performs a linear search, which takes up to n iterations in the worst and average cases.

So, if a file has 10,000 lines, this function may run up to 10,000 iterations to return a result.


Room for Improvement — 1

While reviewing how lineStartIndices is constructed, I discovered something useful:

Since the SourceCode class processes lines sequentially, the lineStartIndices array is always sorted in ascending order.

A sorted array? That means we can use binary search, reducing the complexity to $O(\log n)$!

So, I replaced findIndex with binary search — and it worked!

Now, getLocFromIndex performs in $O(\log n)$ time instead of $O(n)$.

With 10,000 lines, the function only performs about 14 operations instead of 10,000.


Room for Improvement — 2

But should we stop at $O(\log n)$?

Thanks again to the earlier suggestions, I was able to add caching — which means we can now achieve $O(1)$ performance on cache hits.

I reused the existing caching mechanism and introduced an indexToLoc cache.


Result

  • Before: getLocFromIndex always ran in $O(n)$ time, where n is the number of lines in the file.
  • After: It now runs in $O(1)$ with cache hits, and $O(\log n)$ with cache misses.

Is there anything you'd like reviewers to focus on?

Since getIndexFromLoc already works in $O(1)$ time, I didn’t change its current behavior.

Refs: eslint/rewrite#166, eslint/rewrite#212, eslint/markdown#376

@github-project-automation github-project-automation bot moved this to Needs Triage in Triage May 28, 2025
@eslint-github-bot eslint-github-bot bot added the chore This change is not user-facing label May 28, 2025
Copy link

netlify bot commented May 28, 2025

Deploy Preview for docs-eslint canceled.

Name Link
🔨 Latest commit 7710061
🔍 Latest deploy log https://app.netlify.com/projects/docs-eslint/deploys/6836f68d9de8620008921fe8

Copy link

netlify bot commented May 28, 2025

Deploy Preview for docs-eslint canceled.

Name Link
🔨 Latest commit db0f55c
🔍 Latest deploy log https://app.netlify.com/projects/docs-eslint/deploys/683a9f69f5d52c00081397e8

@lumirlumir lumirlumir changed the title perf: improve getLocFromIndex and getIndexFromLoc perf: improve time complexity of getLocFromIndex May 28, 2025
@lumirlumir lumirlumir marked this pull request as ready for review May 28, 2025 15:27
@lumirlumir lumirlumir requested a review from a team as a code owner May 28, 2025 15:27
@snitin315
Copy link
Contributor

Is it possible for you to share some before vs after data?

@lumirlumir lumirlumir marked this pull request as draft May 29, 2025 06:50
@lumirlumir
Copy link
Member Author

lumirlumir commented May 29, 2025

Benchmark

I've run a benchmark and would like to share the results.

Result

Summary: The new version implementation is 8–9 times faster than the original one.

$ node benchmark.js 
Running benchmark...(1/5)
getLocFromIndexBefore: 5.067s
getLocFromIndexAfterWithoutCache: 622.719ms
getLocFromIndexAfter: 769.587ms
Running benchmark...(2/5)
getLocFromIndexBefore: 5.028s
getLocFromIndexAfterWithoutCache: 619.502ms
getLocFromIndexAfter: 720.991ms
Running benchmark...(3/5)
getLocFromIndexBefore: 5.064s
getLocFromIndexAfterWithoutCache: 620.599ms
getLocFromIndexAfter: 761.039ms
Running benchmark...(4/5)
getLocFromIndexBefore: 5.143s
getLocFromIndexAfterWithoutCache: 628.235ms
getLocFromIndexAfter: 750.742ms
Running benchmark...(5/5)
getLocFromIndexBefore: 5.099s
getLocFromIndexAfterWithoutCache: 634.295ms
getLocFromIndexAfter: 729.528ms

Setup

SourceCode class

For comparison, I’ve added getLocFromIndexBefore (the original version) and getLocFromIndexWithoutCache (the new version without using cache) to the SourceCode class.

	getLocFromIndexBefore(index) {
		if (typeof index !== "number") {
			throw new TypeError("Expected `index` to be a number.");
		}

		if (index < 0 || index > this.text.length) {
			throw new RangeError(
				`Index out of range (requested index ${index}, but source text has length ${this.text.length}).`,
			);
		}

		/*
		 * For an argument of this.text.length, return the location one "spot" past the last character
		 * of the file. If the last character is a linebreak, the location will be column 0 of the next
		 * line; otherwise, the location will be in the next column on the same line.
		 *
		 * See getIndexFromLoc for the motivation for this special case.
		 */
		if (index === this.text.length) {
			return {
				line: this.lines.length,
				column: this.lines.at(-1).length,
			};
		}
		/*
		 * To figure out which line index is on, determine the last place at which index could
		 * be inserted into lineStartIndices to keep the list sorted.
		 */
		const lineNumber =
			index >= this.lineStartIndices.at(-1)
				? this.lineStartIndices.length
				: this.lineStartIndices.findIndex(el => index < el);

		return {
			line: lineNumber,
			column: index - this.lineStartIndices[lineNumber - 1],
		};
	}

	getLocFromIndexWithoutCache(index) {
		if (typeof index !== "number") {
			throw new TypeError("Expected `index` to be a number.");
		}

		if (index < 0 || index > this.text.length) {
			throw new RangeError(
				`Index out of range (requested index ${index}, but source text has length ${this.text.length}).`,
			);
		}

		/*
		 * For an argument of this.text.length, return the location one "spot" past the last character
		 * of the file. If the last character is a linebreak, the location will be column 0 of the next
		 * line; otherwise, the location will be in the next column on the same line.
		 *
		 * See getIndexFromLoc for the motivation for this special case.
		 */
		if (index === this.text.length) {
			return {
				line: this.lines.length,
				column: this.lines.at(-1).length,
			};
		}

		/*
		 * To figure out which line index is on, determine the last place at which index could
		 * be inserted into lineStartIndices to keep the list sorted.
		 */
		const lineNumber =
			index >= this.lineStartIndices.at(-1)
				? this.lineStartIndices.length
				: findLineNumberBinarySearch(this.lineStartIndices, index);

		return {
			line: lineNumber,
			column: index - this.lineStartIndices[lineNumber - 1],
		};
	}

benchmark.js

Place benchmark.js in the root directory and run it using node benchmark.js.

"use strict";

/* eslint-disable no-console, jsdoc/require-jsdoc -- Benchmarking */

//------------------------------------------------------------------------------
// Requirements
//------------------------------------------------------------------------------

const { readFileSync } = require('node:fs');
const SourceCode = require('./lib/languages/js/source-code/source-code');
const espree = require("espree");

//------------------------------------------------------------------------------
// Helpers
//------------------------------------------------------------------------------

const ITER = 1_000;
const CODE = readFileSync('./lib/eslint/eslint.js', 'utf8'); // 1,162 lines of code.
const DEFAULT_CONFIG = {
	ecmaVersion: 2024,
	comment: true,
	tokens: true,
	range: true,
	loc: true,
};

function runBefore() {
	const sourceCode = new SourceCode(CODE, espree.parse(CODE, DEFAULT_CONFIG));

	console.time('getLocFromIndexBefore');

	for(let i = 0; i < ITER; i++) {
		for( let j = 0; j < CODE.length; j++) {
			sourceCode.getLocFromIndexBefore(j);
		}
	}

	console.timeEnd('getLocFromIndexBefore');
}

function runAfter() {
	const sourceCode = new SourceCode(CODE, espree.parse(CODE, DEFAULT_CONFIG));

	console.time('getLocFromIndexAfter');

	for(let i = 0; i < ITER; i++) {
		for( let j = 0; j < CODE.length; j++) {
			sourceCode.getLocFromIndex(j);
		}
	}

	console.timeEnd('getLocFromIndexAfter');
}

function runAfterWithoutCache() {
	const sourceCode = new SourceCode(CODE, espree.parse(CODE, DEFAULT_CONFIG));

	console.time('getLocFromIndexAfterWithoutCache');

	for(let i = 0; i < ITER; i++) {
		for( let j = 0; j < CODE.length; j++) {
			sourceCode.getLocFromIndexWithoutCache(j);
		}
	}

	console.timeEnd('getLocFromIndexAfterWithoutCache');
}

//------------------------------------------------------------------------------
// Benchmark
//------------------------------------------------------------------------------

console.log('Running benchmark...(1/5)');
runBefore();
runAfterWithoutCache();
runAfter();

console.log('Running benchmark...(2/5)');
runBefore();
runAfterWithoutCache();
runAfter();

console.log('Running benchmark...(3/5)');
runBefore();
runAfterWithoutCache();
runAfter();

console.log('Running benchmark...(4/5)');
runBefore();
runAfterWithoutCache();
runAfter();

console.log('Running benchmark...(5/5)');
runBefore();
runAfterWithoutCache();
runAfter();

/* eslint-enable no-console, jsdoc/require-jsdoc -- Benchmarking */

@lumirlumir lumirlumir marked this pull request as ready for review May 29, 2025 09:10
@nzakas
Copy link
Member

nzakas commented May 29, 2025

@lumirlumir can you try creating a benchmark using this package? https://www.npmjs.com/package/benchmark

It's optimized to ensure we're not being fooled by the JIT compiler.

@lumirlumir
Copy link
Member Author

lumirlumir commented May 30, 2025

Benchmark 2

@nzakas I've used benchmark package and the result is same with the previous one 👍

Result

Summary: The new version implementation is 7–8 times faster than the original one.

$ node benchmark.js 
getLocFromIndexBefore x 1.87 ops/sec ±4.61% (9 runs sampled)
getLocFromIndexAfterWithoutCache x 14.97 ops/sec ±2.06% (42 runs sampled)
getLocFromIndexAfter x 12.94 ops/sec ±1.61% (36 runs sampled)

Fastest is getLocFromIndexAfterWithoutCache

Performance improvements:

Binary search optimization: 87.49% faster than original
Caching optimization: -15.63% faster than without cache
Total improvement: 85.54% faster than original

Setup

SourceCode class

Same with above.

benchmark.js

Place benchmark.js in the root directory and run it using node benchmark.js.

"use strict";

/* eslint-disable no-console, jsdoc/require-jsdoc, prefer-arrow-callback, no-invalid-this -- Benchmarking */

//------------------------------------------------------------------------------
// Requirements
//------------------------------------------------------------------------------

const { readFileSync } = require('node:fs');
const SourceCode = require('./lib/languages/js/source-code/source-code');
const Benchmark = require('benchmark');
const espree = require("espree");

//------------------------------------------------------------------------------
// Helpers
//------------------------------------------------------------------------------

const ITER = 100;
const CODE = readFileSync('./lib/eslint/eslint.js', 'utf8'); // 1,162 lines of code.
const DEFAULT_CONFIG = {
    ecmaVersion: 2024,
    comment: true,
    tokens: true,
    range: true,
    loc: true,
};

function runBefore() {
    const sourceCode = new SourceCode(CODE, espree.parse(CODE, DEFAULT_CONFIG));

    for(let i = 0; i < ITER; i++) {
        for( let j = 0; j < CODE.length; j++) {
            sourceCode.getLocFromIndexBefore(j);
        }
    }
}

function runAfter() {
    const sourceCode = new SourceCode(CODE, espree.parse(CODE, DEFAULT_CONFIG));

    for(let i = 0; i < ITER; i++) {
        for( let j = 0; j < CODE.length; j++) {
            sourceCode.getLocFromIndex(j);
        }
    }
}

function runAfterWithoutCache() {
    const sourceCode = new SourceCode(CODE, espree.parse(CODE, DEFAULT_CONFIG));

    for(let i = 0; i < ITER; i++) {
        for( let j = 0; j < CODE.length; j++) {
            sourceCode.getLocFromIndexWithoutCache(j);
        }
    }
}

//------------------------------------------------------------------------------
// Benchmark
//------------------------------------------------------------------------------

new Benchmark.Suite()
    .add('getLocFromIndexBefore', function() {
        runBefore();
    })
    .add('getLocFromIndexAfterWithoutCache', function() {
        runAfterWithoutCache();
    })
    .add('getLocFromIndexAfter', function() {
        runAfter();
    })
    .on('cycle', function(event) {
        console.log(String(event.target));
    })
    .on('complete', function() {
        console.log(`\nFastest is ${this.filter('fastest').map('name')}`);

        // Calculate improvement percentages
        const before = this[0].stats;
        const withoutCache = this[1].stats;
        const withCache = this[2].stats;

        const binarySearchImprovement = ((before.mean - withoutCache.mean) / before.mean * 100).toFixed(2);
        const cachingImprovement = ((withoutCache.mean - withCache.mean) / withoutCache.mean * 100).toFixed(2);
        const totalImprovement = ((before.mean - withCache.mean) / before.mean * 100).toFixed(2);

        console.log('\nPerformance improvements:\n');
        console.log(`Binary search optimization: ${binarySearchImprovement}% faster than original`);
        console.log(`Caching optimization: ${cachingImprovement}% faster than without cache`);
        console.log(`Total improvement: ${totalImprovement}% faster than original`);
    })
	.run({ 'async': true });

/* eslint-enable no-console, jsdoc/require-jsdoc, prefer-arrow-callback, no-invalid-this -- Benchmarking */

@nzakas nzakas moved this from Needs Triage to Implementing in Triage May 30, 2025
@nzakas
Copy link
Member

nzakas commented May 30, 2025

It seems like using the version without the cache is the way to go. Nice work!

@lumirlumir lumirlumir added the accepted There is consensus among the team that this change meets the criteria for inclusion label May 31, 2025
@lumirlumir
Copy link
Member Author

I've added a new commit without cache in db0f55c.

Copy link
Contributor

@snitin315 snitin315 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice optimization. Thanks!

Leaving it open for others to review

Copy link
Member

@mdjermanovic mdjermanovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! Leaving it open for @nzakas to verify.

@mdjermanovic mdjermanovic moved this from Implementing to Second Review Needed in Triage Jun 2, 2025
@nzakas nzakas moved this from Second Review Needed to Merge Candidates in Triage Jun 2, 2025
Copy link
Member

@nzakas nzakas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just waiting for patch release window to close before merging.

@fasttime fasttime merged commit e392895 into main Jun 2, 2025
30 checks passed
@fasttime fasttime deleted the perf-improve-getlocfromindex-and-getindexfromloc branch June 2, 2025 21:17
@github-project-automation github-project-automation bot moved this from Merge Candidates to Complete in Triage Jun 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted There is consensus among the team that this change meets the criteria for inclusion chore This change is not user-facing
Projects
Status: Complete
Development

Successfully merging this pull request may close these issues.

5 participants