-
Notifications
You must be signed in to change notification settings - Fork 98
[Core] New global metadata synchronization infrastructure #3159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
890600a
to
3c16a91
Compare
Test Results 4 files - 3 4 suites - 3 58s ⏱️ - 1m 50s Results for commit 3c16a91. ± Comparison against base commit fed9f84. This pull request removes 52 and adds 15 tests. Note that renamed tests count towards both.
|
Test Results 7 files ±0 7 suites ±0 4m 42s ⏱️ -6s For more details on these errors, see this check. Results for commit 255fbf4. ± Comparison against base commit 71758bc. ♻️ This comment has been updated with latest results. |
fdebdb3
to
672483c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @AhmedSoliman for this huge improvement. The changes looks good to me :). I left few comments/questions.
@@ -1205,7 +1203,10 @@ impl LogsController { | |||
WriteError::FailedPrecondition(err) => { | |||
info!(%err, "Detected a concurrent modification to the log chain"); | |||
// Perhaps we already have a newer version, if not, fetch latest. | |||
let _ = Metadata::current().sync(MetadataKind::Logs, TargetVersion::Version(previous_version.next())).await; | |||
let metadata = Metadata::current(); | |||
metadata.notify_observed_version(MetadataKind::Logs, previous_version.next(), None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in the future, FailedPrecondition should contain information about the current version. Hence calling notify_observed_version
with the real current version.
@@ -262,9 +253,6 @@ struct AdminOptionsShadow { | |||
#[serde_as(as = "serde_with::DisplayFromStr")] | |||
heartbeat_interval: humantime::Duration, | |||
|
|||
#[serde_as(as = "serde_with::DisplayFromStr")] | |||
metadata_sync_interval: humantime::Duration, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to leave this in AdminOptionsShadow
to print proper deprecation warnings (that it has been removed) or it has been replaced by metadata_update_interval
use crate::{Version, Versioned, flexbuffers_storage_encode_decode}; | ||
|
||
/// A trait all metadata types managed by metadata manager. | ||
pub trait GlobalMetadata: Versioned + StorageEncode + StorageDecode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
/// A trait all metadata types managed by metadata manager. | ||
pub trait GlobalMetadata: Versioned + StorageEncode + StorageDecode { | ||
/// The key for this metadata type in metadata store | ||
const KEY: &'static str; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how the key is defined near the type, I always forget where the keys constants are defined.
let metadata = Metadata::current(); | ||
let (logs_version, partition_table_version) = match join( | ||
metadata.wait_for_version(MetadataKind::Logs, restate_types::Version::MIN), | ||
metadata.wait_for_version(MetadataKind::PartitionTable, restate_types::Version::MIN), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not synching the schema as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because Schema is not required to have a Version=1 at provisioning time. The first version is written after the user registers a deployment. Note that update_task for schema will continue polling metadata server until the version of the schema is set anyway.
06c2882
to
a434074
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @AhmedSoliman 🦸♂️ Avoiding the excessive syncs and making the metadata update tasks responsive to newly observed versions and allowing to fetch from more than one peer are great improvements. The results speak for itself :-) LGTM. +1 for merging.
// which peers did we ask already? | ||
peers_attempted_for_this_version: HashSet<GenerationalNodeId>, | ||
in_flight_peer_requests: JoinSet<Option<Arc<T>>>, | ||
_marker: std::marker::PhantomData<T>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PhantomData
does not seem to be needed.
|
||
let handle = TaskCenter::spawn_unmanaged( | ||
TaskKind::MetadataBackgroundSync, | ||
"metadata-update-{}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is here a format!("metadata-update-{}", T::KIND)
missing?
crates/types/src/config/common.rs
Outdated
@@ -279,12 +279,24 @@ pub struct CommonOptions { | |||
|
|||
/// # Metadata update interval | |||
/// | |||
/// The interval at which each node checks for metadata updates it has observed from different | |||
/// nodes or other sources. | |||
/// The idle time afterwhich the node will check for metadata updates from metadata store. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after which
67fb684
to
1351e67
Compare
- Auto-recovery needs to be higher than http2 keep-alive timeout - Shorter raft election timeout (~1-ish second)
The new global metadata synchronization infrastructure is designed to ensure that all nodes in the cluster have consistent metadata while reducing load over the metadata store. This has significant implications for the overall performance and reliability of the system, especially when metadata store is overloaded or naturally slow. The new infrastructure removes the need for adhoc metadata sync requests and limits the staleness to a configurable value. Additionally, the new infrastructure will allow nodes to keep clawing metadata changes from the growing number of peers that report the new metadata version to it. Other minor changes: - Removed the option to manually `sync` metadata when requesting to create a snapshot - On node startup, we wait for a full global metadata sync - Added initially loaded metadata versions to the `My Node ID` log message Note that there can be certain scenarios when an isolated node doesn't hear about a metadata change that happened in the cluster (a node that normally doesn't interact with the rest of the cluster). This is still covered by the metadata update interval which will eventually fetch the metadata from metadata store, and in the future, this will be automatically covered by gossip messaging disseminating metadata version changes much quicker than this. ``` // intentionally empty ```
The new global metadata synchronization infrastructure is designed to
ensure that all nodes in the cluster have consistent metadata while reducing load over the metadata store.
This has significant implications for the overall performance and reliability of the system, especially when metadata store is overloaded or naturally slow.
The new infrastructure removes the need for adhoc metadata sync requests and limits the staleness to a configurable value.
Additionally, the new infrastructure will allow nodes to keep clawing metadata changes from the growing number of peers that report the new metadata version to it.
Other minor changes:
sync
metadata when requesting to create a snapshotMy Node ID
log messageNote that there can be certain scenarios when an isolated node doesn't hear about a metadata change that happened in the cluster (a node that normally doesn't interact with the rest of the cluster). This is still covered by the metadata update interval which will eventually fetch the metadata from metadata store, and in the future, this will be automatically covered by gossip messaging disseminating metadata version changes much quicker than this.
Stack created with Sapling. Best reviewed with ReviewStack.