Skip to content

Events skipped in group rebalancing #3703

@ejba

Description

@ejba

In what version(s) of Spring for Apache Kafka are you seeing this issue?

Between 3.1.7 and 3.2.5

Describe the bug

The topic has 12 partitions. Consumers are automatically scaled as soon as event lag is detected (typically 1 to 12 replicas). The group rebalancing currently takes a little time (>10 seconds). The events fetched before the group rebalance are processed with success but the offset commitment fails as the current partition was revoked due to group rebalance (non-fatal failure).

The problem happens after the partition assignment is complete. The partition offset has advanced to the uncommitted offset.

Example:
Consumer A assigned to partition P
Consumer A fetches offset 2148209
Consumer A reads 3 events
11 consumers joins the group
Consumer A tries to commit 2148212 (2148209 + 3)
Commit 2148212 fails because partition P was revoked
Consumer A is assigned to partition P again
Consumer A fetches offset 2148260
48 events were skipped ( 2148260 - 2148212 = 48)

To Reproduce
AckMode = Batch
Partition Assignment Strategy = CooperativeSticky

A consumer continually reads events from the partition.
11 new consumers suddenly join the group.
The consumer is assigned to the previous partition.

Expected behavior

Once the group rebalance completes, it is expected the partition assignment fetch always a committed offset to avoid skipped events.

Sample
I wasn't able to create a sample project yet, but I gather the debug logs that explains this issue in detail.

logs.txt

Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions