doc: Mention mapAsyncPartitioned in context of at-least-once #1630

leviramsey · 2023-05-15T18:20:12Z

This is arguably the canonical use-case for mapAsyncPartitioned, so it makes sense to mention.

This is arguably the canonical use-case for mapAsyncPartitioned.

patriknw · 2023-05-16T07:27:02Z

docs/src/main/paradox/atleastonce.md

@@ -55,6 +55,8 @@ However it can only lead to reordering between messages sent to different substr

 If a particular substream expects to see all messages regarding some entity, it then requires that writers to the source topic become responsible for placing messages about various entities in the appropriate partitions. If your application already has a requirement to preserve the order of messages about a particular entity within a Kafka topic, you will already need to ensure those messages go to the same partition since Kafka only preserves order information within a partition.

+Consider instead using `mapAsyncPartitioned` in place of a `groupBy` followed by `mergeSubstreams`.  Both allow for demultiplexing an input stream, but `mapAsyncPartitioned` will not reorder output messages and also allows its partitions to be finer-grained than a Kafka partition.  Note that partitioning in `mapAsyncPartitioned` only happens within that stage: complex processing of a partition may require techniques outside of streams, such as using the ask pattern to an actor.


Should we skip groupBy completely and recommend mapAsyncPartitioned as the way. Is there any case where groupBy is better?

The main benefit of groupBy is that you stay in the stream API, so if grouping by Kafka partitions, you can still use operators implementable in terms of statefulMap directly and the materializer will take care of things for you.

Flow[(MessageFromKafka, CommittableOffset)] .groupBy(numPartitionsConsumed, _._2.partitionOffset.key.topicPartition, false) .statefulMap(...) .mergeSubstreams

versus the complexity of either spawning an actor per partition to duplicate the statefulMap's logic (and deciding how that interacts with materialization) or still using the stream API yourself and managing what's basically substream materialization yourself.

I think there's a level of complexity of logic in a stream where you're better off moving it into actors and mapAsync*, but there's disagreement on where that level is...

ennru

LGTM.

doc: Mention mapAsyncPartitioned in context of at-least-once

5db502d

This is arguably the canonical use-case for mapAsyncPartitioned.

probot-autolabeler bot added the documentation label May 15, 2023

clarify partitioning in mapAsyncPartitioned

d67a99f

patriknw reviewed May 16, 2023

View reviewed changes

ennru approved these changes Jun 9, 2023

View reviewed changes

ennru merged commit 8670cf9 into akka:main Jun 9, 2023

leviramsey deleted the patch-1 branch June 14, 2023 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

doc: Mention mapAsyncPartitioned in context of at-least-once #1630

doc: Mention mapAsyncPartitioned in context of at-least-once #1630

Uh oh!

leviramsey commented May 15, 2023

Uh oh!

patriknw May 16, 2023

Uh oh!

leviramsey May 16, 2023

Uh oh!

ennru left a comment

Uh oh!

Uh oh!

		@@ -55,6 +55,8 @@ However it can only lead to reordering between messages sent to different substr

		If a particular substream expects to see all messages regarding some entity, it then requires that writers to the source topic become responsible for placing messages about various entities in the appropriate partitions. If your application already has a requirement to preserve the order of messages about a particular entity within a Kafka topic, you will already need to ensure those messages go to the same partition since Kafka only preserves order information within a partition.

		Consider instead using `mapAsyncPartitioned` in place of a `groupBy` followed by `mergeSubstreams`. Both allow for demultiplexing an input stream, but `mapAsyncPartitioned` will not reorder output messages and also allows its partitions to be finer-grained than a Kafka partition. Note that partitioning in `mapAsyncPartitioned` only happens within that stage: complex processing of a partition may require techniques outside of streams, such as using the ask pattern to an actor.

doc: Mention mapAsyncPartitioned in context of at-least-once #1630

doc: Mention mapAsyncPartitioned in context of at-least-once #1630

Uh oh!

Conversation

leviramsey commented May 15, 2023

Uh oh!

patriknw May 16, 2023

Choose a reason for hiding this comment

Uh oh!

leviramsey May 16, 2023

Choose a reason for hiding this comment

Uh oh!

ennru left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!