-
Notifications
You must be signed in to change notification settings - Fork 34.9k
Improve performance of language detection #130006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of language detection #130006
Conversation
src/buildfile.js
Outdated
@@ -17,6 +17,7 @@ exports.base = [{ | |||
|
|||
exports.workerExtensionHost = [entrypoint('vs/workbench/services/extensions/worker/extensionHostWorker')]; | |||
exports.workerNotebook = [entrypoint('vs/workbench/contrib/notebook/common/services/notebookSimpleWorker')]; | |||
exports.workerLanguageDetection = [entrypoint('vs/workbench/services/languageDetection/common/languageDetectionSimpleWorker')]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way I could include the @vscode/vscode-languagedetection
npm package here?
modelService, | ||
'languageDetectionWorkerService', | ||
// TODO: See if it's possible to bundle vscode-languagedetection | ||
FileAccess.asBrowserUri('../../../../../../node_modules/@vscode/vscode-languagedetection/dist/lib/index.js', require).toString(true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The worker had trouble importing this module so I had to pass a Uri to it. There's gotta be a better way....
dispose(): void; | ||
} | ||
|
||
class LanguageDetectionModelManager extends Disposable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was lifted from here:
class EditorModelManager extends Disposable { |
can I reuse it somehow?
private _getOrCreateModelManager(proxy: LanguageDetectionSimpleWorker): LanguageDetectionModelManager { | ||
if (!this._modelManager) { | ||
this._modelManager = this._register(new LanguageDetectionModelManager(proxy, this._modelService, true)); | ||
} | ||
return this._modelManager; | ||
} | ||
|
||
protected _withSyncedResources(resources: URI[]): Promise<LanguageDetectionSimpleWorker> { | ||
return this._getProxy().then((proxy) => { | ||
this._getOrCreateModelManager(proxy).ensureSyncedResources(resources); | ||
return proxy; | ||
}); | ||
} | ||
|
||
private _getOrCreateWorker(): IWorkerClient<LanguageDetectionSimpleWorker> { | ||
if (!this._worker) { | ||
|
||
this._worker = this._register(new SimpleWorkerClient<LanguageDetectionSimpleWorker, LanguageDetectionWorkerHost>( | ||
this._workerFactory, | ||
'vs/workbench/services/languageDetection/browser/languageDetectionSimpleWorker', | ||
new LanguageDetectionWorkerHost( | ||
this.indexJsUri, | ||
this.modelJsonUri, | ||
this.weightsUri) | ||
)); | ||
} | ||
return this._worker; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly lifted from
vscode/src/vs/editor/common/services/editorWorkerServiceImpl.ts
Lines 428 to 457 in 025c9ce
private _getOrCreateWorker(): IWorkerClient<EditorSimpleWorker> { | |
if (!this._worker) { | |
try { | |
this._worker = this._register(new SimpleWorkerClient<EditorSimpleWorker, EditorWorkerHost>( | |
this._workerFactory, | |
'vs/editor/common/services/editorSimpleWorker', | |
new EditorWorkerHost(this) | |
)); | |
} catch (err) { | |
logOnceWebWorkerWarning(err); | |
this._worker = new SynchronousWorkerClient(new EditorSimpleWorker(new EditorWorkerHost(this), null)); | |
} | |
} | |
return this._worker; | |
} | |
protected _getProxy(): Promise<EditorSimpleWorker> { | |
return this._getOrCreateWorker().getProxyObject().then(undefined, (err) => { | |
logOnceWebWorkerWarning(err); | |
this._worker = new SynchronousWorkerClient(new EditorSimpleWorker(new EditorWorkerHost(this), null)); | |
return this._getOrCreateWorker().getProxyObject(); | |
}); | |
} | |
private _getOrCreateModelManager(proxy: EditorSimpleWorker): EditorModelManager { | |
if (!this._modelManager) { | |
this._modelManager = this._register(new EditorModelManager(proxy, this._modelService, this._keepIdleModels)); | |
} | |
return this._modelManager; | |
} |
can I reuse somehow?
src/vs/workbench/services/languageDetection/browser/languageDetectionSimpleWorker.ts
Show resolved
Hide resolved
025c9ce
to
c5ab642
Compare
c5ab642
to
d7de341
Compare
because it's the incorrect path... it makes be believe that the
(or in the prod version) So I wonder if I could at least query that path somehow in the UI thread and then I can pass that down to the worker... |
Using the same logic as the model.js and bin file, I was able to get the vscode-languagedetection js file loaded in the worker both in Desktop and web. Marking as ready for review. |
@alexdima, I will need to talk to you when you get back about whether this is the right idea. I think it would be better to instead bundle my npm module into the worker. |
src/vs/workbench/services/languageDetection/browser/languageDetectionWorkerServiceImpl.ts
Show resolved
Hide resolved
src/vs/workbench/services/languageDetection/browser/languageDetectionWorkerServiceImpl.ts
Outdated
Show resolved
Hide resolved
src/vs/workbench/services/untitled/common/untitledTextEditorModel.ts
Outdated
Show resolved
Hide resolved
src/vs/workbench/services/untitled/common/untitledTextEditorModel.ts
Outdated
Show resolved
Hide resolved
src/vs/workbench/services/untitled/common/untitledTextEditorModel.ts
Outdated
Show resolved
Hide resolved
|
||
const lang = await this.languageDetectionService.detectLanguage(this.resource); | ||
if (!lang) { return; } | ||
this.setModeInternal(lang); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to run into race conditions here? E.g.:
- the model may be disposed already by the time this returns (e.g. editor closed)
- the model might have been changed and another language detection run occurs, should we wire through some kind of cancellation support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the model may be disposed already by the time this returns (e.g. editor closed)
I added a check around setting the mode that checks if it's been disposed.
the model might have been changed and another language detection run occurs, should we wire through some kind of cancellation support?
Possible. Unfortunately, Tensorflow doesn't support cancellation natively... the best we can do is Promise.race
with a token being cancelled. Is there prior art somewhere on that? I'd like to save this for later but I'll open an issue about it if that's ok with you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah one ugly bit of the @debounce
annotation is that we cannot clear the timeout when the model is disposed. That's why I typically prefer to manage my own RunOnceScheduler
which is disposable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had no idea about RunOnceScheduler
! that's nice!
src/vs/workbench/services/untitled/common/untitledTextEditorModel.ts
Outdated
Show resolved
Hide resolved
src/vs/workbench/services/untitled/common/untitledTextEditorModel.ts
Outdated
Show resolved
Hide resolved
6be3d6d
to
a0cf390
Compare
assert.ok(result); | ||
|
||
// language detection is debounced so we need to wait a bit | ||
await sleep(2000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a 2000ms unit test might be problematic since we try to make them fast.
Does it make more sense to do this as an integration test? Which we allow to be more long running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an integration test, but even there waiting 2s is ugly as it will make the builds slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I don't know how to do this correctly then because I need to wait for the debounce to finish (600ms).
Any ideas? Also this test is failing even though it worked locally so I'm guessing this sleep is the cause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was originally thinking there would be some kind of event that I could set up but that doesn't seem to be the case...unless I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Figured it out.
return undefined; | ||
} | ||
|
||
async detectLanguages(resource: URI): Promise<string[]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% convinced we need the detectlanguages
method. I was hoping that the model would work in such a way that with a high probability it detects some language, if that is not the case then from my experience that means the model has not enough information. And I would take those results with a grain of salt. I am not sure if they are useful.
When do you actually use this? In the quick pick?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes in the quick pick. I think it's still useful. If a text editor does contain 2 languages (I had an example where an editor contained both Java and C# code), showing both in the quick pick is convenient.
src/vs/workbench/services/languageDetection/browser/languageDetectionWorkerServiceImpl.ts
Outdated
Show resolved
Hide resolved
src/vs/workbench/services/languageDetection/browser/languageDetectionWorkerServiceImpl.ts
Outdated
Show resolved
Hide resolved
Cool PR! I did an initial review and left comments in the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found only minor things now on my end.
assert.ok(result); | ||
|
||
// language detection is debounced so we need to wait a bit | ||
await sleep(2000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an integration test, but even there waiting 2s is ugly as it will make the builds slower.
src/vs/workbench/services/untitled/common/untitledTextEditorModel.ts
Outdated
Show resolved
Hide resolved
extensions/vscode-api-tests/src/singlefolder-tests/untitled.languagedetection.test.ts
Show resolved
Hide resolved
@@ -141,6 +143,8 @@ export class UntitledTextEditorModel extends BaseTextEditorModel implements IUnt | |||
this.setModeInternal(preferredMode); | |||
} | |||
|
|||
this._autoDetectLanguageScheduler = this._register(new RunOnceScheduler(() => this.autoDetectLanguage(), 600)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even more concise, you can move this entire line to the top where you declare the member.
35f8a31
to
d5b7b3e
Compare
merging this in now since I addressed all the feedback and I'd like to not maintain distro |
partially addresses #129607
Broken up by commit: