-
Notifications
You must be signed in to change notification settings - Fork 693
fix: update leader in topology after a purge #36411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2df46c5
to
e27dca8
Compare
99156f2
to
d6da931
Compare
f81e73e
to
d2cdf34
Compare
Previously, it was not possible to overwrite a partition leader with lower term. This is required to handle out-of-order membership events. With the purge operation, a partition is re-bootstrapped with a clean state. As a result the new leader will start with term 1. To handle out-of-order events properly, instead of updating the existing in-memory cluster state, we always built it from the set of received member properties. This way we generate a new topology from the latest member properties of all brokers that are known to this client/gateway. We still use the term for conflict resolution to determine who is the latest leader. But since the old leader do not exist anymore, the newly generate topology will have the correct leader after the partition re-bootstrap. This approach consumes extra memory to keep track of all memberProperties in BrokerTopologyManager. In addition, it has to iterate over all properties to generate the new topology. However, membership updates are not very often to optimize this path. So this is an acceptable overhead.
8743404
to
9779069
Compare
TBD if this needs to be backported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments and questions
dist/src/main/java/io/camunda/application/commons/configuration/GatewayBasedConfiguration.java
Outdated
Show resolved
Hide resolved
...roker-client/src/main/java/io/camunda/zeebe/broker/client/impl/BrokerClientTopologyImpl.java
Show resolved
Hide resolved
...roker-client/src/main/java/io/camunda/zeebe/broker/client/impl/BrokerClientTopologyImpl.java
Outdated
Show resolved
Hide resolved
panagiotisgts
approved these changes
Aug 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall 🚀 Just some clarifications
...config/src/main/java/io/camunda/zeebe/dynamic/config/GatewayClusterConfigurationService.java
Outdated
Show resolved
Hide resolved
…pt broker BrokerTopologyManager always needs the ClusterConfigurationService to get the correct cluster topology, because the static information is only updated via ClusterConfigurationService.
Multiple members with same member-id "migration" were running, which causes issues in gossiping latest topology info.
The advertised address is automatically resolved only when the host is null.
5fce7f7
to
6382c29
Compare
6382c29
to
fa4964a
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
component/c8-api
All things unified C8 API, e.g. C8 REST
component/operate
Related to the Operate component/team
component/optimize
Related to Optimize component/team
component/tasklist
Related to the Tasklist component/team
component/zeebe
Related to the Zeebe component/team
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
The
BrokerTopologyManager
used to keep the latest topology and update it whenever a new event is received via gossip. Since the ordering of the events from different nodes are not guaranteed, it is unclear which update is the latest. So we used the term to the find out which node is the latest leader, in-case we receive updates out-of-order. This caused issues after a purge operation because the term restarts at 0. Even after receiving latest updates from the all nodes, this state cannot be updated because the term is still lower than the previously known term.To fix this, we now keep track of all member properties in
BrokerTopologyManager
. Instead of updating the topology, we calculate a new topology by aggregating the latest member properties (BrokerInfo
) that it knows. We still use the term to find the latest leader to handle out-of-order updates. But eventually an update from leader term 0 after purge will be reflected in the topology when the updates from other nodes that they are not a leader is received.This PR consists of:
BrokerClusterState
which is always created by aggregating latest member propertiesClusterConfigurationService
. This is different from the previous behavior, where it was also updated from theBrokerInfo
received via gossip.BrokerTopologyManager
starts cluster configuration services and listens the updates.BrokerClusterStateImpl
. All usages in the tests are replaced with a mock implementation.In addition these changes uncovered another bug by DefaultAdvertisedAddressIT. The default host was set by the unified configuration which prevented automatic resolution of advertised address. The test was passing before because it was only checking for topology which will get updated anyway because the member properties in swim is recieved via the confiugred initial-contact point. But with the changes in this PR, the topology is not complete until the Cluster configuration service can gossip the correct cluster configuration. Without correct advertised address, the communication was broken between broker and gateway. To catch such error, the test is also extended to send commands to command-api which should fail if the advertised address is not set correctly.
Checklist
Related issues
closes #35650