-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
In what version(s) of Spring for Apache Kafka are you seeing this issue?
Between 3.1.7 and 3.2.5
Describe the bug
The topic has 12 partitions. Consumers are automatically scaled as soon as event lag is detected (typically 1 to 12 replicas). The group rebalancing currently takes a little time (>10 seconds). The events fetched before the group rebalance are processed with success but the offset commitment fails as the current partition was revoked due to group rebalance (non-fatal failure).
The problem happens after the partition assignment is complete. The partition offset has advanced to the uncommitted offset.
Example:
Consumer A assigned to partition P
Consumer A fetches offset 2148209
Consumer A reads 3 events
11 consumers joins the group
Consumer A tries to commit 2148212 (2148209 + 3)
Commit 2148212 fails because partition P was revoked
Consumer A is assigned to partition P again
Consumer A fetches offset 2148260
48 events were skipped ( 2148260 - 2148212 = 48)
To Reproduce
AckMode = Batch
Partition Assignment Strategy = CooperativeSticky
A consumer continually reads events from the partition.
11 new consumers suddenly join the group.
The consumer is assigned to the previous partition.
Expected behavior
Once the group rebalance completes, it is expected the partition assignment fetch always a committed offset to avoid skipped events.
Sample
I wasn't able to create a sample project yet, but I gather the debug logs that explains this issue in detail.
Thanks in advance!