Skip to content

Conversation

rchiodo
Copy link
Contributor

@rchiodo rchiodo commented Apr 16, 2022

Fixes #9689

@DonJayamanne wanted to know your opinion on this.

The genesis for this is that the notebook controller manager has explicit knowledge of local kernels. For this to work in web we need to remove this knowledge.

This is my attempt to do that by:

  • Creating an IKernelFinder that doesn't explicitly say where kernels come from
  • Changing NotebookControllerManager to use this new finder
  • Moving caching into this IKernelFinder
  • Moving some of the telemetry in the NotebookControllerManager into the IKernelFinder
  • findKernel could actually be shared between remote and local (uses the same algorithm)
  • ILocalKernelFinder no longer needs to cache. It's handled by the caller
  • IRemoteKernelFinder gets caching for free

This is just an idea, wanted to know what you thought of it.

I think right now I have validation of the cached kernels in the wrong spot. This could probably move to the ILocalKernelFinder and the IRemoteKernelFinder.

@rchiodo rchiodo requested a review from a team as a code owner April 16, 2022 01:19
@rchiodo rchiodo closed this Apr 16, 2022
@rchiodo rchiodo reopened this Apr 16, 2022
@rchiodo
Copy link
Contributor Author

rchiodo commented Apr 16, 2022

Dang it, how do I turn it back into a draft?

// Finding a kernel is the same no matter what the source
@traceDecoratorVerbose('Find kernel spec', TraceOptions.BeforeCall | TraceOptions.Arguments)
@captureTelemetry(Telemetry.KernelFinderPerf)
public async findKernel(
Copy link
Contributor Author

@rchiodo rchiodo Apr 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Find is now common to local/remote

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think it probably didn't work correctly before? It did them separately.

).then((l) => this.finishListingKernels(l, useCache, 'remote'));
}

private async listKernelsUsingFinder(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caching happens in this function. Should be the same logic you had before.

const key = `${kind}:${useCache}`;
if (this.startTimeForFetching && !this.fetchingTelemetrySent.has(key)) {
this.fetchingTelemetrySent.add(key);
sendTelemetryEvent(Telemetry.FetchControllers, this.startTimeForFetching.elapsedTime, {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This telemetry used to be the in the NotebookControllerManager

) {}
@traceDecoratorVerbose('Find kernel spec', TraceOptions.BeforeCall | TraceOptions.Arguments)
@captureTelemetry(Telemetry.KernelFinderPerf)
public async findKernel(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Find and caching were moved up a level.

if (this.isLocalLaunch) {
return;
}
if (this.failedToFetchRemoteKernels) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this altogether for now. I assume you're doing this differently anyway.

@@ -262,8 +263,8 @@ function sendTelemetryEventInternal<P extends IEventNamePropertyMapping, E exten

reporter.sendTelemetryEvent(eventNameSent, customProperties, measures);
}

traceVerbose(
traceWithLevel(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh moved this to an even higher trace level. Telemetry spew was bothering me too.

@codecov-commenter
Copy link

codecov-commenter commented Apr 16, 2022

Codecov Report

Merging #9694 (611e327) into main (261afd9) will decrease coverage by 0%.
The diff coverage is 50%.

❗ Current head 611e327 differs from pull request most recent head 01920b3. Consider uploading reports for the commit 01920b3 to get more accurate results

@@          Coverage Diff          @@
##           main   #9694    +/-   ##
=====================================
- Coverage    63%     63%    -1%     
=====================================
  Files       204     204            
  Lines      9840    9843     +3     
  Branches   1556    1557     +1     
=====================================
+ Hits       6242    6243     +1     
+ Misses     3335    3100   -235     
- Partials    263     500   +237     
Impacted Files Coverage Δ
src/platform/logging/index.ts 71% <33%> (-2%) ⬇️
src/platform/logging/types.ts 100% <100%> (ø)
src/platform/api/kernelApi.node.ts 70% <0%> (-1%) ⬇️
src/platform/api.ts 57% <0%> (ø)
src/platform/errors/types.ts 77% <0%> (ø)
src/platform/api/pythonApi.ts 71% <0%> (ø)
src/platform/errors/errors.ts 62% <0%> (ø)
src/platform/common/utils.node.ts 57% <0%> (ø)
src/platform/common/utils/misc.ts 71% <0%> (ø)
... and 42 more

@@ -486,6 +486,11 @@ export function compareKernels(
return 1;
} else if (b === activeInterpreterConnection) {
return -1;
// Interpreter should trump kernelspec
} else if (a.kind === 'startUsingPythonInterpreter' && b.kind !== 'startUsingPythonInterpreter') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest, why would this be the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the tests were guaranteeing that so I assume that was the wanted behavior. I think before it was just luck that it was happening.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the test that was failing it would get to this point and the two kernelspecs would be equivalent (start from kernelspec and start from interpreter). There was nothing that would distinguish them.

}

// Send telemetry related to fetching the kernel connections
// Send telemetry related to fetching the kernel connections. Do it here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part feels a lot better with your change.

Copy link
Member

@IanMatthewHuff IanMatthewHuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you are still in Draft. But seems pretty peachy to me. NotebookControllerManager feels a good bit cleaner as the big plus.

@rchiodo
Copy link
Contributor Author

rchiodo commented Apr 18, 2022

Not sure if you are still in Draft. But seems pretty peachy to me. NotebookControllerManager feels a good bit cleaner as the big plus.

Sorry there is one small downside that I forgot to mention.

Previously we used to only validate the preferred kernel when searching through the cache. Now we validate the entire cache before returning the preferred kernel.

So imagine you have 20 kernels in your cache, We will now make sure the file for all 20 intepreters exists before returning the preferred one. We used to only check for file existence on the first one.

I don't think a STAT call on 20 files is going to take very long, but it is a consequence of moving the caching around.

@rchiodo
Copy link
Contributor Author

rchiodo commented Apr 18, 2022

I think I could make it so the cache is not validated during the preferred search though. It's all internal to the kernel finder anyway.

@rchiodo
Copy link
Contributor Author

rchiodo commented Apr 18, 2022

I think the algorithm would have to be tweaked though. Like pass the validation into the findPreferredKernel function and instead of it returning the top of the list, it validates the top until it finds one that's valid.

@rchiodo
Copy link
Contributor Author

rchiodo commented Apr 18, 2022

But it seems like premature optimization to me.

I wrote some code to test that hypothesis:

var glob = require("glob");
var fsExtra = require("fs-extra");

async function run() {
// options is optional
var files = await new Promise((resolve, reject) => {
        glob('./src/**/*.ts', (ex, res) => {
            if (ex) {
                reject(ex);
            } else {
                resolve(res);
            }
        });
    });
    let start = new Date().getTime();
    await Promise.all(files.map(async (f) => {
      await fsExtra.pathExists(f);
    }));
    let end = new Date().getTime();
    console.log(`Total time for ${files.length} files is ${end-start}`);
}

run();

That prints out this:

Total time for 771 files is 15

So 15 ms for 771 files. So even if the user has a ridiculous number of interpreters, it's still super fast to check for existence.

@rchiodo rchiodo merged commit 6e6f09e into main Apr 18, 2022
@rchiodo rchiodo deleted the rchiodo/combine_finders branch April 18, 2022 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Abstract the concept of 'local' and 'remote' from the NotebookControllerManager
4 participants