Skip to content

Conversation

TylerLeonhardt
Copy link
Member

@TylerLeonhardt TylerLeonhardt commented Aug 2, 2021

partially addresses #129607

Broken up by commit:

  1. moves everything into a worker
  2. moves event handling to UntitledEditorModel (more code organization than anything else...)
  3. trying to reuse as many interfaces and classes in the worker
  4. adds telemetry fixes Automatic language detection telemetry #129576

src/buildfile.js Outdated
@@ -17,6 +17,7 @@ exports.base = [{

exports.workerExtensionHost = [entrypoint('vs/workbench/services/extensions/worker/extensionHostWorker')];
exports.workerNotebook = [entrypoint('vs/workbench/contrib/notebook/common/services/notebookSimpleWorker')];
exports.workerLanguageDetection = [entrypoint('vs/workbench/services/languageDetection/common/languageDetectionSimpleWorker')];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way I could include the @vscode/vscode-languagedetection npm package here?

modelService,
'languageDetectionWorkerService',
// TODO: See if it's possible to bundle vscode-languagedetection
FileAccess.asBrowserUri('../../../../../../node_modules/@vscode/vscode-languagedetection/dist/lib/index.js', require).toString(true),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The worker had trouble importing this module so I had to pass a Uri to it. There's gotta be a better way....

dispose(): void;
}

class LanguageDetectionModelManager extends Disposable {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was lifted from here:

class EditorModelManager extends Disposable {

can I reuse it somehow?

Comment on lines 211 to 238
private _getOrCreateModelManager(proxy: LanguageDetectionSimpleWorker): LanguageDetectionModelManager {
if (!this._modelManager) {
this._modelManager = this._register(new LanguageDetectionModelManager(proxy, this._modelService, true));
}
return this._modelManager;
}

protected _withSyncedResources(resources: URI[]): Promise<LanguageDetectionSimpleWorker> {
return this._getProxy().then((proxy) => {
this._getOrCreateModelManager(proxy).ensureSyncedResources(resources);
return proxy;
});
}

private _getOrCreateWorker(): IWorkerClient<LanguageDetectionSimpleWorker> {
if (!this._worker) {

this._worker = this._register(new SimpleWorkerClient<LanguageDetectionSimpleWorker, LanguageDetectionWorkerHost>(
this._workerFactory,
'vs/workbench/services/languageDetection/browser/languageDetectionSimpleWorker',
new LanguageDetectionWorkerHost(
this.indexJsUri,
this.modelJsonUri,
this.weightsUri)
));
}
return this._worker;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly lifted from

private _getOrCreateWorker(): IWorkerClient<EditorSimpleWorker> {
if (!this._worker) {
try {
this._worker = this._register(new SimpleWorkerClient<EditorSimpleWorker, EditorWorkerHost>(
this._workerFactory,
'vs/editor/common/services/editorSimpleWorker',
new EditorWorkerHost(this)
));
} catch (err) {
logOnceWebWorkerWarning(err);
this._worker = new SynchronousWorkerClient(new EditorSimpleWorker(new EditorWorkerHost(this), null));
}
}
return this._worker;
}
protected _getProxy(): Promise<EditorSimpleWorker> {
return this._getOrCreateWorker().getProxyObject().then(undefined, (err) => {
logOnceWebWorkerWarning(err);
this._worker = new SynchronousWorkerClient(new EditorSimpleWorker(new EditorWorkerHost(this), null));
return this._getOrCreateWorker().getProxyObject();
});
}
private _getOrCreateModelManager(proxy: EditorSimpleWorker): EditorModelManager {
if (!this._modelManager) {
this._modelManager = this._register(new EditorModelManager(proxy, this._modelService, this._keepIdleModels));
}
return this._modelManager;
}

can I reuse somehow?

@TylerLeonhardt TylerLeonhardt force-pushed the TylerLeonhardt/improve-perf-languagedetection branch from 025c9ce to c5ab642 Compare August 2, 2021 23:05
@TylerLeonhardt TylerLeonhardt force-pushed the TylerLeonhardt/improve-perf-languagedetection branch from c5ab642 to d7de341 Compare August 3, 2021 15:45
@TylerLeonhardt
Copy link
Member Author

So I noticed that this 404s:
image

because it's the incorrect path... it makes be believe that the import() inside of the worker isn't honoring the paths defined here:

'@vscode/vscode-languagedetection': `${window.location.origin}/static/remote/web/node_modules/@vscode/vscode-languagedetection/dist/lib/index.js`,

(or in the prod version)

So I wonder if I could at least query that path somehow in the UI thread and then I can pass that down to the worker...

@TylerLeonhardt TylerLeonhardt marked this pull request as ready for review August 3, 2021 20:46
@TylerLeonhardt
Copy link
Member Author

Using the same logic as the model.js and bin file, I was able to get the vscode-languagedetection js file loaded in the worker both in Desktop and web. Marking as ready for review.

@TylerLeonhardt
Copy link
Member Author

@alexdima, I will need to talk to you when you get back about whether this is the right idea. I think it would be better to instead bundle my npm module into the worker.


const lang = await this.languageDetectionService.detectLanguage(this.resource);
if (!lang) { return; }
this.setModeInternal(lang);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to run into race conditions here? E.g.:

  • the model may be disposed already by the time this returns (e.g. editor closed)
  • the model might have been changed and another language detection run occurs, should we wire through some kind of cancellation support?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the model may be disposed already by the time this returns (e.g. editor closed)

I added a check around setting the mode that checks if it's been disposed.

the model might have been changed and another language detection run occurs, should we wire through some kind of cancellation support?

Possible. Unfortunately, Tensorflow doesn't support cancellation natively... the best we can do is Promise.race with a token being cancelled. Is there prior art somewhere on that? I'd like to save this for later but I'll open an issue about it if that's ok with you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah one ugly bit of the @debounce annotation is that we cannot clear the timeout when the model is disposed. That's why I typically prefer to manage my own RunOnceScheduler which is disposable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had no idea about RunOnceScheduler! that's nice!

@TylerLeonhardt TylerLeonhardt force-pushed the TylerLeonhardt/improve-perf-languagedetection branch from 6be3d6d to a0cf390 Compare August 6, 2021 00:04
assert.ok(result);

// language detection is debounced so we need to wait a bit
await sleep(2000);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a 2000ms unit test might be problematic since we try to make them fast.
Does it make more sense to do this as an integration test? Which we allow to be more long running?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an integration test, but even there waiting 2s is ugly as it will make the builds slower.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't know how to do this correctly then because I need to wait for the debounce to finish (600ms).

Any ideas? Also this test is failing even though it worked locally so I'm guessing this sleep is the cause.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally thinking there would be some kind of event that I could set up but that doesn't seem to be the case...unless I'm wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Figured it out.

return undefined;
}

async detectLanguages(resource: URI): Promise<string[]> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% convinced we need the detectlanguages method. I was hoping that the model would work in such a way that with a high probability it detects some language, if that is not the case then from my experience that means the model has not enough information. And I would take those results with a grain of salt. I am not sure if they are useful.

When do you actually use this? In the quick pick?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes in the quick pick. I think it's still useful. If a text editor does contain 2 languages (I had an example where an editor contained both Java and C# code), showing both in the quick pick is convenient.

@isidorn
Copy link
Collaborator

isidorn commented Aug 6, 2021

Cool PR! I did an initial review and left comments in the code.
Once you have something you are happy with let me know so I try this out end to end. Thanks 👏

Copy link
Member

@bpasero bpasero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found only minor things now on my end.

assert.ok(result);

// language detection is debounced so we need to wait a bit
await sleep(2000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an integration test, but even there waiting 2s is ugly as it will make the builds slower.

@@ -141,6 +143,8 @@ export class UntitledTextEditorModel extends BaseTextEditorModel implements IUnt
this.setModeInternal(preferredMode);
}

this._autoDetectLanguageScheduler = this._register(new RunOnceScheduler(() => this.autoDetectLanguage(), 600));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even more concise, you can move this entire line to the top where you declare the member.

@TylerLeonhardt TylerLeonhardt force-pushed the TylerLeonhardt/improve-perf-languagedetection branch from 35f8a31 to d5b7b3e Compare August 6, 2021 17:43
@TylerLeonhardt
Copy link
Member Author

merging this in now since I addressed all the feedback and I'd like to not maintain distro

@TylerLeonhardt TylerLeonhardt merged commit cfcda1c into main Aug 6, 2021
@TylerLeonhardt TylerLeonhardt deleted the TylerLeonhardt/improve-perf-languagedetection branch August 6, 2021 18:56
@github-actions github-actions bot locked and limited conversation to collaborators Sep 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automatic language detection telemetry
3 participants