Skip to content

Conversation

neilalexander
Copy link
Member

@neilalexander neilalexander commented Jul 16, 2025

Right now, in a stream scale up/scale down scenario where there are consumers that also need to be scaled, the metaleader makes a proposal to update the stream and then immediately makes proposals to update the consumers. However, this does not account for the fact that the stream proposal could fail to be applied, which is where we have sometimes seen peer set drifts.

This moves this logic to take place on the stream leader once the stream update proposal is applied instead.

This will also be necessary in a world where the consumer assignments are managed by the stream leader, as the metaleader will not know anything about the stream NRG.

Signed-off-by: Neil Twigg neil@nats.io

@neilalexander neilalexander force-pushed the neil/consumerproposals branch 3 times, most recently from d13b298 to 15f4c05 Compare July 16, 2025 14:48
@neilalexander neilalexander force-pushed the neil/consumerproposals branch 2 times, most recently from 5399212 to ee5277d Compare July 16, 2025 15:06
@neilalexander neilalexander marked this pull request as ready for review July 16, 2025 15:34
@neilalexander neilalexander requested a review from a team as a code owner July 16, 2025 15:34
Copy link
Member

@MauriceVanVeen MauriceVanVeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@neilalexander neilalexander force-pushed the neil/consumerproposals branch 3 times, most recently from 735ec1d to 9034b73 Compare July 18, 2025 10:25
@neilalexander neilalexander changed the title Stream leader should propose consumer remaps once assignment processed (2.12) Stream leader should propose consumer remaps once assignment processed Jul 18, 2025
Copy link
Member

@derekcollison derekcollison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - one minor nit.

@@ -29,6 +29,8 @@ import (
"testing"
"time"

"slices"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PLace with all other single name imports

@neilalexander neilalexander force-pushed the neil/consumerproposals branch from 9034b73 to 7928ef6 Compare July 21, 2025 15:44
@neilalexander neilalexander merged commit 74a7d88 into main Jul 21, 2025
69 of 70 checks passed
@neilalexander neilalexander deleted the neil/consumerproposals branch July 21, 2025 16:05
neilalexander added a commit that referenced this pull request Jul 29, 2025
The way that consumer assignments were proposed was changed in #7071 so
that the stream leader is responsible for those updates instead of the
metaleader. This subtly changes the timing involved in stream moves and
cancels, as it is now not a given that the consumer assignments are
always serialised with stream assignments in the same order. If a
consumer assignment fails due to a cancelled stream move for example, we
should not propose deleting the consumer, as the new stream leader will
shortly come along and update the assignment correctly for the new
placement.

Signed-off-by: Neil Twigg <neil@nats.io>
MauriceVanVeen added a commit that referenced this pull request Sep 3, 2025
…gnment processed (#7071)"

This reverts commit 74a7d88, reversing
changes made to 425a7e5.
MauriceVanVeen added a commit that referenced this pull request Sep 3, 2025
…gnment processed (#7071)"

This reverts commit 74a7d88, reversing
changes made to 425a7e5.
MauriceVanVeen added a commit that referenced this pull request Sep 3, 2025
…gnment processed (#7071)"

This reverts commit 74a7d88, reversing
changes made to 425a7e5.

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
derekcollison added a commit that referenced this pull request Sep 3, 2025
Revert #7071, since it would
result in losing consumer state when scaling down a replicated stream
and the consumer state would need to be moved, but the current stream
leader doesn't host that consumer.

`TestJetStreamClusterConsumerReplicasAfterScale` started flaking due to
this bug. `TestJetStreamClusterConsumerReplicasAfterScaleMoveConsumer`
has been added in this PR which specifically tests for this situation
(the former test would only hit that randomly, hence it flaking).

We'll need to revisit this later when implementing the stream-managed
consumers, which isn't going to be for 2.12.

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants