Improve performance of language detection #130006

TylerLeonhardt · 2021-08-02T22:49:24Z

partially addresses #129607

Broken up by commit:

moves everything into a worker
moves event handling to UntitledEditorModel (more code organization than anything else...)
trying to reuse as many interfaces and classes in the worker
adds telemetry fixes Automatic language detection telemetry #129576

TylerLeonhardt · 2021-08-02T22:50:06Z

src/buildfile.js

@@ -17,6 +17,7 @@ exports.base = [{

 exports.workerExtensionHost = [entrypoint('vs/workbench/services/extensions/worker/extensionHostWorker')];
 exports.workerNotebook = [entrypoint('vs/workbench/contrib/notebook/common/services/notebookSimpleWorker')];
+exports.workerLanguageDetection = [entrypoint('vs/workbench/services/languageDetection/common/languageDetectionSimpleWorker')];


Is there a way I could include the @vscode/vscode-languagedetection npm package here?

TylerLeonhardt · 2021-08-02T22:50:59Z

src/vs/workbench/services/languageDetection/browser/languageDetectionWorkerServiceImpl.ts

+			modelService,
+			'languageDetectionWorkerService',
+			// TODO: See if it's possible to bundle vscode-languagedetection
+			FileAccess.asBrowserUri('../../../../../../node_modules/@vscode/vscode-languagedetection/dist/lib/index.js', require).toString(true),


The worker had trouble importing this module so I had to pass a Uri to it. There's gotta be a better way....

TylerLeonhardt · 2021-08-02T22:52:50Z

src/vs/workbench/services/languageDetection/browser/languageDetectionWorkerServiceImpl.ts

+	dispose(): void;
+}
+
+class LanguageDetectionModelManager extends Disposable {


This was lifted from here:

vscode/src/vs/editor/common/services/editorWorkerServiceImpl.ts

Line 276 in 025c9ce

class EditorModelManager extends Disposable {

can I reuse it somehow?

TylerLeonhardt · 2021-08-02T22:54:08Z

src/vs/workbench/services/languageDetection/browser/languageDetectionWorkerServiceImpl.ts

+	private _getOrCreateModelManager(proxy: LanguageDetectionSimpleWorker): LanguageDetectionModelManager {
+		if (!this._modelManager) {
+			this._modelManager = this._register(new LanguageDetectionModelManager(proxy, this._modelService, true));
+		}
+		return this._modelManager;
+	}
+
+	protected _withSyncedResources(resources: URI[]): Promise<LanguageDetectionSimpleWorker> {
+		return this._getProxy().then((proxy) => {
+			this._getOrCreateModelManager(proxy).ensureSyncedResources(resources);
+			return proxy;
+		});
+	}
+
+	private _getOrCreateWorker(): IWorkerClient<LanguageDetectionSimpleWorker> {
+		if (!this._worker) {
+
+			this._worker = this._register(new SimpleWorkerClient<LanguageDetectionSimpleWorker, LanguageDetectionWorkerHost>(
+				this._workerFactory,
+				'vs/workbench/services/languageDetection/browser/languageDetectionSimpleWorker',
+				new LanguageDetectionWorkerHost(
+					this.indexJsUri,
+					this.modelJsonUri,
+					this.weightsUri)
+			));
+		}
+		return this._worker;
+	}


mostly lifted from

vscode/src/vs/editor/common/services/editorWorkerServiceImpl.ts

Lines 428 to 457 in 025c9ce

private _getOrCreateWorker(): IWorkerClient<EditorSimpleWorker> {

if (!this._worker) {

try {

this._worker = this._register(new SimpleWorkerClient<EditorSimpleWorker, EditorWorkerHost>(

this._workerFactory,

'vs/editor/common/services/editorSimpleWorker',

new EditorWorkerHost(this)

));

} catch (err) {

logOnceWebWorkerWarning(err);

this._worker = new SynchronousWorkerClient(new EditorSimpleWorker(new EditorWorkerHost(this), null));

}

}

return this._worker;

}

protected _getProxy(): Promise<EditorSimpleWorker> {

return this._getOrCreateWorker().getProxyObject().then(undefined, (err) => {

logOnceWebWorkerWarning(err);

this._worker = new SynchronousWorkerClient(new EditorSimpleWorker(new EditorWorkerHost(this), null));

return this._getOrCreateWorker().getProxyObject();

});

}

private _getOrCreateModelManager(proxy: EditorSimpleWorker): EditorModelManager {

if (!this._modelManager) {

this._modelManager = this._register(new EditorModelManager(proxy, this._modelService, this._keepIdleModels));

}

return this._modelManager;

}

can I reuse somehow?

src/vs/workbench/services/languageDetection/browser/languageDetectionSimpleWorker.ts

TylerLeonhardt · 2021-08-03T16:31:41Z

So I noticed that this 404s:

because it's the incorrect path... it makes be believe that the import() inside of the worker isn't honoring the paths defined here:

vscode/src/vs/code/browser/workbench/workbench-dev.html

Line 45 in d7de341

    
           '@vscode/vscode-languagedetection': `${window.location.origin}/static/remote/web/node_modules/@vscode/vscode-languagedetection/dist/lib/index.js`,

(or in the prod version)

So I wonder if I could at least query that path somehow in the UI thread and then I can pass that down to the worker...

TylerLeonhardt · 2021-08-03T20:47:14Z

Using the same logic as the model.js and bin file, I was able to get the vscode-languagedetection js file loaded in the worker both in Desktop and web. Marking as ready for review.

TylerLeonhardt · 2021-08-03T20:47:56Z

@alexdima, I will need to talk to you when you get back about whether this is the right idea. I think it would be better to instead bundle my npm module into the worker.

src/vs/workbench/services/languageDetection/browser/languageDetectionWorkerServiceImpl.ts

src/vs/workbench/services/untitled/common/untitledTextEditorModel.ts

bpasero · 2021-08-05T15:40:36Z

src/vs/workbench/services/untitled/common/untitledTextEditorModel.ts

+
+		const lang = await this.languageDetectionService.detectLanguage(this.resource);
+		if (!lang) { return; }
+		this.setModeInternal(lang);


Is it possible to run into race conditions here? E.g.:

the model may be disposed already by the time this returns (e.g. editor closed)

the model might have been changed and another language detection run occurs, should we wire through some kind of cancellation support?

the model may be disposed already by the time this returns (e.g. editor closed)

I added a check around setting the mode that checks if it's been disposed.

the model might have been changed and another language detection run occurs, should we wire through some kind of cancellation support?

Possible. Unfortunately, Tensorflow doesn't support cancellation natively... the best we can do is Promise.race with a token being cancelled. Is there prior art somewhere on that? I'd like to save this for later but I'll open an issue about it if that's ok with you.

Yeah one ugly bit of the @debounce annotation is that we cannot clear the timeout when the model is disposed. That's why I typically prefer to manage my own RunOnceScheduler which is disposable.

I had no idea about RunOnceScheduler! that's nice!

src/vs/workbench/services/untitled/common/untitledTextEditorModel.ts

isidorn · 2021-08-06T09:47:56Z

extensions/vscode-api-tests/src/singlefolder-tests/untitled.languagedetection.test.ts

+		assert.ok(result);
+
+		// language detection is debounced so we need to wait a bit
+		await sleep(2000);


Having a 2000ms unit test might be problematic since we try to make them fast.
Does it make more sense to do this as an integration test? Which we allow to be more long running?

This is an integration test, but even there waiting 2s is ugly as it will make the builds slower.

Yeah I don't know how to do this correctly then because I need to wait for the debounce to finish (600ms).

Any ideas? Also this test is failing even though it worked locally so I'm guessing this sleep is the cause.

I was originally thinking there would be some kind of event that I could set up but that doesn't seem to be the case...unless I'm wrong.

Figured it out.

src/vs/editor/common/services/editorSimpleWorker.ts

src/vs/workbench/browser/parts/editor/editorStatus.ts

isidorn · 2021-08-06T09:59:27Z

src/vs/workbench/services/languageDetection/browser/languageDetectionWorkerServiceImpl.ts

+		return undefined;
+	}
+
+	async detectLanguages(resource: URI): Promise<string[]> {


I am not 100% convinced we need the detectlanguages method. I was hoping that the model would work in such a way that with a high probability it detects some language, if that is not the case then from my experience that means the model has not enough information. And I would take those results with a grain of salt. I am not sure if they are useful.

When do you actually use this? In the quick pick?

Yes in the quick pick. I think it's still useful. If a text editor does contain 2 languages (I had an example where an editor contained both Java and C# code), showing both in the quick pick is convenient.

src/vs/workbench/services/languageDetection/browser/languageDetectionWorkerServiceImpl.ts

isidorn · 2021-08-06T10:06:09Z

Cool PR! I did an initial review and left comments in the code.
Once you have something you are happy with let me know so I try this out end to end. Thanks 👏

bpasero

Found only minor things now on my end.

bpasero · 2021-08-06T11:34:30Z

extensions/vscode-api-tests/src/singlefolder-tests/untitled.languagedetection.test.ts

+		assert.ok(result);
+
+		// language detection is debounced so we need to wait a bit
+		await sleep(2000);


This is an integration test, but even there waiting 2s is ugly as it will make the builds slower.

src/vs/workbench/services/untitled/common/untitledTextEditorModel.ts

extensions/vscode-api-tests/src/singlefolder-tests/untitled.languagedetection.test.ts

bpasero · 2021-08-06T16:29:47Z

src/vs/workbench/services/untitled/common/untitledTextEditorModel.ts

@@ -141,6 +143,8 @@ export class UntitledTextEditorModel extends BaseTextEditorModel implements IUnt
 			this.setModeInternal(preferredMode);
 		}

+		this._autoDetectLanguageScheduler = this._register(new RunOnceScheduler(() => this.autoDetectLanguage(), 600));


Even more concise, you can move this entire line to the top where you declare the member.

TylerLeonhardt · 2021-08-06T18:55:06Z

merging this in now since I addressed all the feedback and I'd like to not maintain distro

TylerLeonhardt requested review from rebornix, bpasero and isidorn August 2, 2021 22:49

TylerLeonhardt commented Aug 2, 2021

View reviewed changes

src/vs/workbench/services/languageDetection/browser/languageDetectionSimpleWorker.ts Show resolved Hide resolved

TylerLeonhardt force-pushed the TylerLeonhardt/improve-perf-languagedetection branch from 025c9ce to c5ab642 Compare August 2, 2021 23:05

hediet assigned TylerLeonhardt Aug 3, 2021

TylerLeonhardt force-pushed the TylerLeonhardt/improve-perf-languagedetection branch from c5ab642 to d7de341 Compare August 3, 2021 15:45

TylerLeonhardt marked this pull request as ready for review August 3, 2021 20:46

bpasero requested changes Aug 5, 2021

View reviewed changes

TylerLeonhardt force-pushed the TylerLeonhardt/improve-perf-languagedetection branch from 6be3d6d to a0cf390 Compare August 6, 2021 00:04

isidorn reviewed Aug 6, 2021

View reviewed changes

bpasero requested changes Aug 6, 2021

View reviewed changes

bpasero reviewed Aug 6, 2021

View reviewed changes

TylerLeonhardt added 9 commits August 6, 2021 10:42

initial move to worker

61f7b1b

move event handling to the untitledTextEditorModel

62dcfe5

reuse simpleWorker interfaces and classes

5ece5ef

use correct path to languageDetection

6edcaca

have vscode-languagedetection be outside of the asar

73104b1

add telemetry

7e99c8f

don't unpackage anything from languagedetection because it's not needed

16ac4e7

add an integration test

0467bfb

some of Ben's feedback

87382f5

TylerLeonhardt added 7 commits August 6, 2021 10:42

rework worker code to avoid duplication

c84d5b0

add isDisposed check

4f50e39

fix test

6393570

Isi and Ben feedback part 2

8241d24

use RunOnceScheduler instead and try to fix the test using events

a93aba4

Ben feedback part 3

b13f77f

bump distro

d5b7b3e

TylerLeonhardt force-pushed the TylerLeonhardt/improve-perf-languagedetection branch from 35f8a31 to d5b7b3e Compare August 6, 2021 17:43

TylerLeonhardt merged commit cfcda1c into main Aug 6, 2021

TylerLeonhardt deleted the TylerLeonhardt/improve-perf-languagedetection branch August 6, 2021 18:56

This was referenced Aug 6, 2021

Automatic classification: more aggressive debounce and perf considerations #129607

Open

Using npm modules in web workers in our code - a retrospective #130302

Closed

github-actions bot locked and limited conversation to collaborators Sep 20, 2021

	private _getOrCreateWorker(): IWorkerClient<EditorSimpleWorker> {
	if (!this._worker) {
	try {
	this._worker = this._register(new SimpleWorkerClient<EditorSimpleWorker, EditorWorkerHost>(
	this._workerFactory,
	'vs/editor/common/services/editorSimpleWorker',
	new EditorWorkerHost(this)
	));
	} catch (err) {
	logOnceWebWorkerWarning(err);
	this._worker = new SynchronousWorkerClient(new EditorSimpleWorker(new EditorWorkerHost(this), null));
	}
	}
	return this._worker;
	}

	protected _getProxy(): Promise<EditorSimpleWorker> {
	return this._getOrCreateWorker().getProxyObject().then(undefined, (err) => {
	logOnceWebWorkerWarning(err);
	this._worker = new SynchronousWorkerClient(new EditorSimpleWorker(new EditorWorkerHost(this), null));
	return this._getOrCreateWorker().getProxyObject();
	});
	}

	private _getOrCreateModelManager(proxy: EditorSimpleWorker): EditorModelManager {
	if (!this._modelManager) {
	this._modelManager = this._register(new EditorModelManager(proxy, this._modelService, this._keepIdleModels));
	}
	return this._modelManager;
	}

Improve performance of language detection #130006

Improve performance of language detection #130006

Uh oh!

Conversation

TylerLeonhardt commented Aug 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

TylerLeonhardt commented Aug 3, 2021

Uh oh!

TylerLeonhardt commented Aug 3, 2021

Uh oh!

TylerLeonhardt commented Aug 3, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

isidorn commented Aug 6, 2021

Uh oh!

bpasero left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TylerLeonhardt commented Aug 6, 2021

Uh oh!

Uh oh!

TylerLeonhardt commented Aug 2, 2021 •

edited

Loading