Skip to content

Conversation

leviramsey
Copy link
Contributor

This is arguably the canonical use-case for mapAsyncPartitioned, so it makes sense to mention.

This is arguably the canonical use-case for mapAsyncPartitioned.
@@ -55,6 +55,8 @@ However it can only lead to reordering between messages sent to different substr

If a particular substream expects to see all messages regarding some entity, it then requires that writers to the source topic become responsible for placing messages about various entities in the appropriate partitions. If your application already has a requirement to preserve the order of messages about a particular entity within a Kafka topic, you will already need to ensure those messages go to the same partition since Kafka only preserves order information within a partition.

Consider instead using `mapAsyncPartitioned` in place of a `groupBy` followed by `mergeSubstreams`. Both allow for demultiplexing an input stream, but `mapAsyncPartitioned` will not reorder output messages and also allows its partitions to be finer-grained than a Kafka partition. Note that partitioning in `mapAsyncPartitioned` only happens within that stage: complex processing of a partition may require techniques outside of streams, such as using the ask pattern to an actor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we skip groupBy completely and recommend mapAsyncPartitioned as the way. Is there any case where groupBy is better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main benefit of groupBy is that you stay in the stream API, so if grouping by Kafka partitions, you can still use operators implementable in terms of statefulMap directly and the materializer will take care of things for you.

Flow[(MessageFromKafka, CommittableOffset)]
  .groupBy(numPartitionsConsumed, _._2.partitionOffset.key.topicPartition, false)
  .statefulMap(...)
  .mergeSubstreams

versus the complexity of either spawning an actor per partition to duplicate the statefulMap's logic (and deciding how that interacts with materialization) or still using the stream API yourself and managing what's basically substream materialization yourself.

I think there's a level of complexity of logic in a stream where you're better off moving it into actors and mapAsync*, but there's disagreement on where that level is...

Copy link
Contributor

@ennru ennru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ennru ennru merged commit 8670cf9 into akka:main Jun 9, 2023
@leviramsey leviramsey deleted the patch-1 branch June 14, 2023 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants