Skip to content

Conversation

tillrohrmann
Copy link
Contributor

Since we cannot prevent tail writes from being incomplete in the presence of hard crashes, using the DBRecoveryMode::AbsoluteConsistency might be too restrictive because it prevents the node from restarting. If we haven't completed the tail write, then we can be sure that we haven't acted on it yet. Therefore, it is ok to resume and ignore it. The only problem is that the tail entry might also have been corrupted after having been acted upon. This commit weakens the recovery mode in favor of DBRecoveryMode::TolerateCorruptedTailRecords.

This fixes #3154.

With this commit we are now properly setting the default values when using a
deprecated option. This is particularly relevant if the deprecated option is a
struct that contains mutliple values and the user only specifies a few of them.
In this case, we want to a) not fail because not all fields have been specified
and b) fill in the default values that are specified in the configuration.
Before, the problem was that default values weren't respected for flattened structs
(see serde-rs/serde#1879). Moreover, if a field of a sub-
struct was specified, then the default value of the sub-struct would be used instead
of the default value specified in the parent. This was problematic with RocksDbOptions,
for example. Since all fields are optional, it would deserialize as all not specified
fields being None. If the parent enabled the WAL, then it would not have been respected.

This fixes restatedev#3139.
Disabling WAL by default has the risk that the WAL gets disabled if we are
using RocksDbOptions::default(). This has happened when using the deprecated
metadata-store option because of a serde bug where defaults are not respected
for flattened structs.
…rver

Since we cannot prevent tail writes from being incomplete in the presence of
hard crashes, using the DBRecoveryMode::AbsoluteConsistency might be too
restrictive because it prevents the node from restarting. If we haven't completed
the tail write, then we can be sure that we haven't acted on it yet. Therefore,
it is ok to resume and ignore it. The only problem is that the tail entry might
also have been corrupted after having been acted upon. This commit weakens the
recovery mode in favor of DBRecoveryMode::TolerateCorruptedTailRecords.

This fixes restatedev#3154.
@tillrohrmann tillrohrmann merged commit 85cbaf0 into restatedev:main Apr 15, 2025
19 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Apr 15, 2025
@tillrohrmann tillrohrmann deleted the issues/3154 branch April 15, 2025 16:10
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky test: building metadata store failed: failed creating raft storage: failed creating RocksDb: Corruption: truncated record body
2 participants