Skip to content

Handle EH restart gracefully to avoid notebook working copy data loss #179224

@rebornix

Description

@rebornix

Does this issue occur when all extensions are disabled?: Yes/No

  • VS Code Version: 1.78 Insiders
  • OS Version: macOS 13.3

Over the past year, we've seen dozens of users report that they can't save notebooks or get the error "Can't read properties of undefined". We also got similar failures/errors from error telemetry or flaky integration tests. This issue in Jupyter microsoft/vscode-jupyter#11435 gave us good hints of what lead to this issue: EH crashed or restarted when there was a dirty notebook document in editors and things fell apart.

What happened under the hood

  • Users have a dirty ipynb document
  • EH crashed (in above issue, it was caused by Copilot both last year and last month)
  • All extHostCustomers were disposed
  • If users chose to save, it would fail as the NotebookSerializer was disposed. The NotebookSerializer instance held a rpc proxy which was also disposed. There is no way to send the requests to anywhere to serialize the notebook document to bytes.
  • If users chose to cancel
    • They can still make changes to the document
    • When users switch to another editor and then come back, the editor will throw an error like "Can't read properties of undefined", this is because

Now users can't save the document, they can't restart the window either as the dirty notebook document can't be back'ed up. Their last option is discard the change. Even if the EH was restarted successfully, since the NotebookFileWorkingCopyModel was still using the legacy disposed NotebookSerializer, they can't use the new NotebookSerializer registered for jupyter-notebook view type.

Proposals

The main challenge here is most notebook serializers (the interactive window one is the only exception now) run in EH, if the EH is down, there isn't anything we can do as we rely on the serializer to do the data conversion. Users can potentially separate extensions to different EH instances to mitigate the issue (separate good and bad extensions) but usually when this happened, it was usually too late.

With that said, we should still explore what we can do to improve this, or minimize the impact. Some ideas from my explorations:

  • When EH crashed/disposed, wait for some threshold before attempting to dispose NotebookEditorInput (which triggers Save) as we know it won't succeed. If the EH comes back within a reasonable threshold, re-establish the serializer to the NotebookFileWorkingCopy so it can use the new serializer to save.
    • With this approach, we still need to figure out what we should do if the EH is completely down, how do we ensure we still have hot-exit/backups
  • Or, for ipynb users (our major notebook users, > 1M MAU), we move the serializer to the core. Once we have the scratchbook working copy support, we will remove the interactive serializer and use ipynb serializer for Interactive Window.
    • It doesn't affect our bundle size and we can replace the Interactive Window notebook serializer with it. However we didn't solve this problem for other notebook users

@bpasero I'd like to hear your opinions on this and maybe brainstorm a bit on how we can improve this situation.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions