Skip to content

Conversation

joeyballentine
Copy link
Member

After doing some profiling, I noticed that clearing the cuda cache was taking significant time to do. And, this was happening after every upscale, which i'm pretty sure is incorrect and bad. We only need to release the memory when we're actually done with it (aka when the chain finishes).

So, I added a hook system to packages. Not sure if this is the best api for it, but packages can now define on_chain_finish callbacks to be run when the chain is finished, to do things like resource clearing.

This will hopefully bring a little speed improvement to batch upscales

@RunDevelopment
Copy link
Member

This will hopefully bring a little speed improvement to batch upscales

"Hopefully"? Did... did you not measure whether your optimization actually made things faster?

And on that note: if you do an optimization, please also report how much faster this actually makes things. As a reviewer, I also have to weigh whether the speed-up is worth the tradeoff in code simplicity/quality/maintainability. I can of course measure things myself, but having you give a reference case/number is useful.

Not sure if this is the best api for it, but packages can now define on_chain_finish callbacks to be run when the chain is finished, to do things like resource clearing.

I would suggest the following API: we have a def add_cleanup(callback: Callable[[], None]) -> None method to NodeContext. This will allow nodes to add cleanup functions that run when the chain is done. Cleanup callbacks are stored in a set, so nodes can decide to run a cleanup action once or once per node execution by doing:

def do_clean_up(): ...

@register(...)
def my_node(context: NodeContext):
    # run once per node execution
    context.add_cleanup(lambda: do_clean_up())
    # run once per chain
    context.add_cleanup(do_clean_up)

This fixes the following issue with the current set_on_chain_finish API:

  • set_on_chain_finish always runs cleanup actions even if no nodes from that package were executed.
  • Cleanup actions were not allowed to assume the dependencies of the package.

@joeyballentine
Copy link
Member Author

"Hopefully"? Did... did you not measure whether your optimization actually made things faster?

I did, after I made this PR, but that was with all of my changes together. I'll test this individually.

@joeyballentine
Copy link
Member Author

Turns out this is doing most of the heavy lifting for the optimizations I tested. a 2m20s processing time went down to 2m01s with just this change alone.

@joeyballentine joeyballentine changed the title Add chain finish hook + empty cuda cache on chain finish Add per-node chain-finish cleanup functions + empty cuda cache only on chain finish May 17, 2024
Copy link
Member

@RunDevelopment RunDevelopment left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor things.

joeyballentine and others added 3 commits May 17, 2024 12:15
Co-authored-by: Michael Schmidt <mitchi5000.ms@googlemail.com>
Co-authored-by: Michael Schmidt <mitchi5000.ms@googlemail.com>
Co-authored-by: Michael Schmidt <mitchi5000.ms@googlemail.com>
@RunDevelopment
Copy link
Member

Probably needs an from __future__ import annotations somewhere.

@joeyballentine joeyballentine merged commit 3ae2726 into main May 17, 2024
@joeyballentine joeyballentine deleted the chain_hook_cuda_empty branch May 17, 2024 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants