-
Notifications
You must be signed in to change notification settings - Fork 98
Introduce ReadIsolation for Storage transactions #3198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @@ -415,6 +423,11 @@ impl PartitionStore { | |||
| ) | |||
| }); | |||
|
|
|||
| let snapshot = match read_isolation { | |||
| ReadIsolation::None => None, | |||
| ReadIsolation::Snapshot => Some(self.rocksdb.inner().as_raw_db().snapshot()), | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't run any benchmarks on the negative effects of using snapshot reads in the invoker yet.
63ec615 to
f67061a
Compare
crates/storage-api/src/lib.rs
Outdated
| pub enum ReadIsolation { | ||
| /// No read isolation. This allows the storage implementation to return dirty reads. Use this | ||
| /// level, if you know that you are the only writer. | ||
| None, | ||
| /// Reads are served from a consistent snapshot of the underlying storage implementation. This | ||
| /// ensures that multiple reads get a consistent view of the stored information. Use this level, | ||
| /// if you need to do multiple reads and tolerate concurrent write operations. | ||
| /// | ||
| /// Note that using this level is more costly than [`ReadIsolation::None`]. | ||
| Snapshot, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if the documentation matches reality here. What do we mean by "dirty reads" here? The default is that the writebatch/tx's changes is not visible to other transactions unless it's committed.
Do we have other uses of the API that split a single logical operation across multiple write batches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true for the concrete implementation of our Storage trait. I just didn't want to overfit to it (even though there is probably no other impl in the foreseeable future).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we call this IsolationLevel and maybe rename Snapshot to RepeatableReads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Will update it accordingly.
54df4b9 to
27376da
Compare
|
I've updated this PR and included a commit which shows how a transaction with repeatable read isolation can be used by the invoker to ensure the reading the journal, journal metadata and state is consistent. |
27376da to
0ed7317
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good from my side. Only a small note on the test to reduce the amount of work it needs to do.
| @@ -143,8 +144,7 @@ pub(super) struct InvocationTask<SR, JR, EE, DMR> { | |||
| retry_count_since_last_stored_entry: u32, | |||
|
|
|||
| // Invoker tx/rx | |||
| state_reader: SR, | |||
| journal_reader: JR, | |||
| invocation_reader: IR, | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
0ed7317 to
0439b9d
Compare
This commit introduces the ReadIsolation level for Storage transactions. This allows to choose between no or snapshot read isolation. The latter is useful when reading from outside of the partition processor where concurrent writes can happen (e.g. the invoker reading the journal, the metadata and the state of an invocation). The snapshot isolation level creates a Rocksdb snapshot which will affect compaction. It is, therefore, recommended to not keep snapshot transactions open for too long. This fixes restatedev#3192.
…eads In order to tolerate concurrent writes happening in the PP, the invoker is now using the InvokerReaderTransaction when reading the journal, journal metadata and the state. The transaction ensures that all reads see a consistent view of the underlying storage until the transaction is closed.
0439b9d to
d1a411b
Compare
This commit introduces the ReadIsolation level for Storage transactions. This allows to choose between no or snapshot read isolation. The latter is useful when reading from outside of the partition processor where concurrent writes can happen (e.g. the invoker reading the journal, the metadata and the state of an invocation). The snapshot isolation level creates a Rocksdb snapshot which will affect compaction. It is, therefore, recommended to not keep snapshot transactions open for too long.
This fixes #3192.