Skip to content

Conversation

MauriceVanVeen
Copy link
Member

A Limits-based R3 stream can have ephemeral consumers that are R1. Ephemeral consumers don't specify a replica count, instead they specify Replicas: 0, to be inferred automatically based on the stream retention and the amount of stream replicas.

Before this PR, createGroupForConsumer would check there's at least one online server hosting the stream, but it would still select a peer set (R3 in this example) equal to the stream. After creating the group, it would be "fixed" by doing rg.Peers = []string{rg.Preferred} if it's an ephemeral consumer.

In 2.12, due to this fix: #7083, rg.setPreferred() will always only select online servers. But on 2.11 and prior, it could randomly select any node, even ones that are offline. Even though this will be fixed as well in 2.12, the real fix is to use the real replica count in createGroupForConsumer, immediately selecting the right peer set and not try fixing it outside of it.

Signed-off-by: Maurice van Veen github@mauricevanveen.com

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
@MauriceVanVeen MauriceVanVeen requested a review from a team as a code owner August 12, 2025 19:21
@@ -7640,12 +7641,6 @@ func (s *Server) jsClusteredConsumerRequest(ci *ClientInfo, acc *Account, subjec

// We need to set the ephemeral here before replicating.
if !isDurableConsumer(cfg) {
// We chose to have ephemerals be R=1 unless stream is interest or workqueue.
// Consumer can override.
if sa.Config.Retention == LimitsPolicy && cfg.Replicas <= 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this logic now handled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's handled as part of: replicas := cfg.replicas(sa.Config)

// Calculate accurate replicas for the consumer config with the parent stream config.
func (consCfg ConsumerConfig) replicas(strCfg *StreamConfig) int {
	if consCfg.Replicas == 0 || consCfg.Replicas > strCfg.Replicas {
		if !isDurableConsumer(&consCfg) && strCfg.Retention == LimitsPolicy && consCfg.Replicas == 0 {
			// Matches old-school ephemerals only, where the replica count is 0.
			return 1
		}
		return strCfg.Replicas
	}
	return consCfg.Replicas
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I more meant the selection of the rg.Peers and the name selection for the consumer..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's already handled by cc.createGroupForConsumer

	// First shuffle the active peers and then select to account for replica = 1.
	rand.Shuffle(len(active), func(i, j int) { active[i], active[j] = active[j], active[i] })
	peers = active[:replicas]
}
...
return &raftGroup{Name: groupNameForConsumer(peers, storage), Storage: storage, Peers: peers}

@MauriceVanVeen MauriceVanVeen marked this pull request as draft August 12, 2025 20:33
@MauriceVanVeen
Copy link
Member Author

Will have a longer look at this tomorrow morning.

@MauriceVanVeen
Copy link
Member Author

Confirmed to be working. We now always select the right peer set with the right amount of peers in cc.createGroupForConsumer.

@MauriceVanVeen MauriceVanVeen marked this pull request as ready for review August 13, 2025 05:46
Copy link
Member

@neilalexander neilalexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@neilalexander neilalexander merged commit 9406307 into main Aug 13, 2025
89 of 92 checks passed
@neilalexander neilalexander deleted the maurice/consumer-r0-select-online branch August 13, 2025 10:01
neilalexander added a commit that referenced this pull request Aug 13, 2025
Includes the following:

- #7140
- #7142
- #7145
- #7150
- #7151
- #7154
- #7156
- #7122
- #7166 (excluding the Go
version bump)
- #7162
- #7165

Signed-off-by: Neil Twigg <neil@nats.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants