Automatic language detection plan

Continuing from https://github.com/microsoft/vscode/issues/118455 ... this will span multiple milestones.

### Where we are

Since [this PR](https://github.com/microsoft/vscode/pull/128708) is merged in which brings in [vscode-languagedetection](https://github.com/Microsoft/vscode-languagedetection), we now have automatic language detection. What's it all about?
* 100% no code leaves your VS Code instance. Model is queried locally.
* opt-in feature
   * `"workbench.editor.untitled.languageDetection": true`
   * supports language specific enablement: `"[plaintext]" { "workbench.editor.untitled.languageDetection": true }`
* Powered by @yoeo's [guesslang](https://github.com/yoeo/guesslang) model (the latest release of it) which supports 30 languages
* Very basic additional heuristics to help the model with accuracy: (adding confidence of JS and TS, and C and C++
* Uncompressed (~4MB package)

This provides an "ok" experience but the model has to be very very sure it's the language it thinks it is to get the untitled file to change. You can enable the feature, and paste in a pretty large sample of code, and it should work.

### Where we wanna be

* Support as many languages as possible (JSON, YAML, XML, are not supported today for example)
* You should be able to open an untitled file and start typing and the language detection flips on as fast as possible
* A nice experience to handle tie breakers, "almost confident but not confident enough" situations and when we are wrong
* Compressed as much as possible (preliminary tests seem to say we can get down to 2-3MB)

### How we'll get there

* Improved guesslang model
  * [x] https://github.com/yoeo/guesslang/pull/33 which adds support for 14 more languages (JSON, YAML, XML included)
  * [ ] Possibly help guesslang with more files to train on
* Improved heuristics
  * [x] Variable confidence acceptance (i.e. If the model is 30% confident it's Java, but <1% confident it's anything else, then it's probably Java and we should set that)
  * [x] https://github.com/microsoft/vscode/issues/129596
  * [x] ~Look at the user's workspace and influence weight based on what's open~ sounds too costly of an operation
* Improved UX
  * [x] ~In the event of a tie breaker, show a notification or similar to say "I'm tied between these X languages. which one is it?"~
  * [x] Make sure the user's decision doesn't get overwritten
  * [x] If we were wrong, promote the language picker (maybe show a badge on the language status bar entry when we change the lang)
  * [x] add detected languages to the top of the language picker
* Improved Perf
  * [ ] Compress the model as much as possible
  * [x] Handle large files (i.e. someone pasting a huge JSON payload)
  * [x] debounce event for untitled files
* Improved feedback
  * [x] It's important to understand how the model is doing for users to make sure it's actually useful. To do that, we opened https://github.com/microsoft/vscode/issues/129576

### Additional possible investigations

* [x] Have `code -` open untitled files and detect the language
* Kernel guessing of a Jupyter Notebook
* Provide feedback to the model so that it can learn from users? (this would be totally local)
* Use a different model than guesslang

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Automatic language detection plan #129004

Where we are

Where we wanna be

How we'll get there

Additional possible investigations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Automatic language detection plan #129004

Description

Where we are

Where we wanna be

How we'll get there

Additional possible investigations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions