Skip to content

[2/s] contextually save DagsterInstance for use in component loading #31358

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: benpankow/state-store
Choose a base branch
from

Conversation

benpankow
Copy link
Member

@benpankow benpankow commented Jul 25, 2025

Summay

Adds a base StateStore class which defines an interface for saving and loading defs state - the DagsterInstance implements this interface. StateStore stores a global, current instance of the state, which is set in various Dagster load paths to the instance, so that it is contextually available to user code.

Test Plan

Unit tests that the state store is populated in various user code contexts.

@benpankow
Copy link
Member Author

benpankow commented Jul 25, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@benpankow benpankow force-pushed the benpankow/persist-load-context branch from f842374 to 7d52c8c Compare July 25, 2025 23:48
@benpankow benpankow force-pushed the benpankow/state-store branch from f24f7f4 to 1e4d52f Compare July 25, 2025 23:48
@benpankow benpankow force-pushed the benpankow/persist-load-context branch from 222633b to 67d5aeb Compare July 28, 2025 18:57
@benpankow benpankow force-pushed the benpankow/state-store branch from 1e4d52f to 2cc1189 Compare July 28, 2025 18:57
@OwenKephart OwenKephart changed the base branch from benpankow/state-store to graphite-base/31358 July 29, 2025 17:44
@OwenKephart OwenKephart force-pushed the graphite-base/31358 branch from 2cc1189 to 8349f1b Compare July 29, 2025 18:29
@OwenKephart OwenKephart force-pushed the benpankow/persist-load-context branch from d0b1ae9 to 63e9527 Compare July 29, 2025 18:29
@OwenKephart OwenKephart changed the base branch from graphite-base/31358 to benpankow/state-store July 29, 2025 18:29
@OwenKephart OwenKephart force-pushed the benpankow/persist-load-context branch from 63e9527 to dfde473 Compare July 29, 2025 18:30
@OwenKephart OwenKephart force-pushed the benpankow/state-store branch from 8349f1b to 90eb27a Compare July 29, 2025 18:30
@OwenKephart OwenKephart force-pushed the benpankow/persist-load-context branch from dfde473 to 9cd14f2 Compare July 30, 2025 00:16
@OwenKephart OwenKephart force-pushed the benpankow/persist-load-context branch from 9cd14f2 to b54a941 Compare July 30, 2025 23:27
@OwenKephart OwenKephart force-pushed the benpankow/state-store branch from 9ada543 to a5da656 Compare July 30, 2025 23:27
@OwenKephart OwenKephart marked this pull request as ready for review July 30, 2025 23:33
@OwenKephart OwenKephart force-pushed the benpankow/persist-load-context branch 2 times, most recently from af35cdb to 3352ae0 Compare July 31, 2025 01:30
Copy link
Member

@alangenfeld alangenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have another StateStore thats not the instance ? If not could we defer this until we have one?

Can we leverage the DagsterInstance.__enter__ for managing a current active instance ctx var? Seeing all these new callsites for StateStore ctx management spooks me.

@OwenKephart OwenKephart force-pushed the benpankow/persist-load-context branch from 3352ae0 to ac8f7bc Compare July 31, 2025 20:30
@OwenKephart OwenKephart force-pushed the benpankow/state-store branch from a5da656 to b7aa2ed Compare July 31, 2025 20:30
Copy link
Contributor

@alangenfeld I've updated this pr to make StateStorage a property of the instance rather than having the instance be a StateStore. The basic idea is that it's fairly likely that in the future people will want to customize this, and it also makes it feel nicer to implement this in cloud.

Regarding the DagsterInstance.__enter__ bit, the scary bit there for me is the fact that in most places we just get the instance using DagsterInstance.from_ref() (i.e. not inside of a context manager). If we then started using context managers selectively, we'd end up calling dispose() in a bunch of new places

@OwenKephart OwenKephart requested a review from alangenfeld July 31, 2025 21:39
@alangenfeld alangenfeld requested a review from gibsondan July 31, 2025 21:48
@alangenfeld
Copy link
Member

we just get the instance using DagsterInstance.from_ref() (i.e. not inside of a context manager)

which callsites are you targeting that do not enter the instance context manager after instantiating it via ref? I see a few misses but the majority of places seem to enter the context directly or via exit stack

@OwenKephart OwenKephart force-pushed the benpankow/persist-load-context branch from ac8f7bc to e995e00 Compare July 31, 2025 21:55
@OwenKephart OwenKephart force-pushed the benpankow/state-store branch from b7aa2ed to d7fb077 Compare July 31, 2025 21:55
Comment on lines +38 to +48
def load_state_to_path(self, key: str, version: str, path: Path) -> None:
"""Loads the state file for the given defs key and version into the given file path.

Args:
key (str): The key of the state to retrieve.
version (str): The version of the state to retrieve.
path (Path): The path to write the state to.

Returns:
bool: True if the state was loaded, False otherwise.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring for load_state_to_path indicates it returns a boolean value (Returns: bool: True if the state was loaded, False otherwise), but the implementation raises an exception on failure and doesn't return anything on success. This creates a mismatch between the documented behavior and actual implementation. Either update the docstring to remove the return type or modify the implementation to return True after successfully writing to the path.

Suggested change
def load_state_to_path(self, key: str, version: str, path: Path) -> None:
"""Loads the state file for the given defs key and version into the given file path.
Args:
key (str): The key of the state to retrieve.
version (str): The version of the state to retrieve.
path (Path): The path to write the state to.
Returns:
bool: True if the state was loaded, False otherwise.
"""
def load_state_to_path(self, key: str, version: str, path: Path) -> None:
"""Loads the state file for the given defs key and version into the given file path.
Args:
key (str): The key of the state to retrieve.
version (str): The version of the state to retrieve.
path (Path): The path to write the state to.
"""

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

Copy link
Member

haven't totally digested this PR yet but +1 to alex's last comment - the only places I see not using from_ref() outside of a contextmanager with are either old celery/dagstermill code or places that are callsites for creating an instance, which should then also be using a

Copy link
Member

@gibsondan gibsondan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on contextmanager/lifecycle concerns

StateStorage.set_current(
instance.state_storage
if isinstance(instance, DagsterInstance)
else DagsterInstance.from_ref(instance).state_storage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this does not look quite right to me in terms of lifecycle management. Is it a big lift to make this method require an instance instead of an instance_ref? Or use an ExitStack here will creating the execution plan?

@@ -538,10 +538,10 @@ def get_execution_plan(
asset_selection=remote_job.asset_selection,
asset_check_selection=remote_job.asset_check_selection,
),
instance=instance or DagsterInstance.get(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also does not seem quite right

@OwenKephart OwenKephart force-pushed the benpankow/persist-load-context branch from e995e00 to f7e4bb7 Compare August 5, 2025 21:31
@OwenKephart OwenKephart force-pushed the benpankow/state-store branch from d7fb077 to 324ebb6 Compare August 5, 2025 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants