feat(ingestion): conversion events buffer consumer #9432

yakkomajuri · 2022-04-15T15:46:30Z

Problem

#9182

Changes

Follow up to #9427.

Implements the consumer for the buffer.

This is slightly tricky. Still needs tests.

How did you test this code?

Added tests + ran manually.

To run manually:

I set CONVERSION_BUFFER_ENABLED=1 and BUFFER_CONVERSION_SECONDS=30
I added logging at the producer and consumer
I sent events that would go to the buffer
The logs confirmed the flow:
- Event is sent to buffer
- Event is consumed from buffer
- DelayProcessing is thrown
- Consumer sleeps
- Consumer comes back up
- Event is ingested correctly

plugin-server/src/main/ingestion-queues/kafka-queue.ts

tiina303 · 2022-04-19T14:01:49Z

plugin-server/src/main/ingestion-queues/kafka-queue.ts

+            }
+        }
+
+        await Promise.all(promises)


we should commit the offsets right after this, currently we do the work, then try to set up the pausing, then commit offsets, if we die while sleeping we'd try to process them again & we should try to minimize the amount of time after processing something and committing offsets.

Tagged you in another comment.

Also, we don't synchronously sleep. setTimeout runs in the background

sure based on all you say it should work, though I'd find it easier to read if we had await commitOffsetsIfNecessary() right after this line unless there's a reason not to?

There is a good reason - if I run commitOffsetsIfNecessary it will move the offset past this entire batch. We rely on throw new DelayProcessing() to commit only the messages we processed via resolveOffset and not the entire batch.

I should however do some thinking about the problems of running Promise.all with this. This might be more robust if done in order.

Yeah I think the way to go is to do these in order and set partitionsConsumedConcurrently

plugin-server/src/main/ingestion-queues/kafka-queue.ts

tiina303 · 2022-04-19T14:07:50Z

plugin-server/src/main/ingestion-queues/kafka-queue.ts

+        for (const message of batch.messages) {
+            // kafka timestamps are unix timestamps in string format
+            const processAt = Number(message.timestamp) + this.pluginsServer.BUFFER_CONVERSION_SECONDS * 1000
+            const delayUntilTimeToProcess = processAt - new Date().getTime()


Instead of keeping track of the max delay seconds lets track the latest processAt (so we can sleep less later as this processing and the promise.all later will take some time too). Furthermore once we start not processing we shouldn't process any other messages later probably either?

how efficient is using new Date().getTime() vs doing that once in the front and using a variable?

how efficient is using new Date().getTime() vs doing that once in the front and using a variable?

This is negligible. I've nevertheless moved it to Date.now() which is faster and cleaner.

Instead of keeping track of the max delay seconds lets track the latest processAt (so we can sleep less later as this processing and the promise.all later will take some time too). Furthermore once we start not processing we shouldn't process any other messages later probably either?

I'd rather sleep more actually. First, note that we're sleeping per partition. Even if a given partition is sleeping we can still consumer from others.

The goal with sleeping longer is to maximize our chances of pulling a full batch we can process synchronously in the future.

Consider we get a batch:

event , ts -------------- event0, 0 event1, 2 event3, 40 event4, 55 *now == 0

Kafka will feed us batches, so imagine I pull a batch with all the events above. I can process event0, and then I sleep 2s and pull the next 3 events from Kafka again. Now I sleep some more and pull event3 and event4. I'm incurring the overhead of pulling and filtering the batch multiple times when I could have just moved on to other partitions and pull a whole batch from this one when I'm more certain I'll be able to process a larger batch.

By sleeping for the maximum possible value in a batch, I also increase the chance that after the consumer is back up and gets the assignment, I'll be able to process event5 and event6 (etc etc) as well

No sorry I didn't explain myself well in the proposal for "Instead of keeping track of the max delay seconds lets track the latest processAt"

I'll modify your example a bit:

event , ts -------------- event0, 0 event1, 5 event3, 40 event4, 55 *now == 2

the idea was to:

get max ts, which in your example would be 55

At the point of sleep calculate the sleep time, e.g. maybe a second has passed, so now instead of sleeping 53s (55-2) we'd only sleep 52s (55-3).

Once we wake from sleep we'll be able to process all events and we're less likely to over sleep.

This isn't super important probably ... depends on how resource constrained the plugin-server is etc.

plugin-server/src/main/ingestion-queues/kafka-queue.ts

Co-authored-by: Tiina Turban <tiina303@gmail.com>

yakkomajuri · 2022-04-25T16:39:48Z

Updated description to add testing steps

* master: (137 commits) feat(cohorts): add cohort filter grammars (#9540) feat(cohorts): Backwards compatibility of groups and properties (#9462) perf(ingestion): unsubscribe from buffer topic while no events are produced to it (#9556) fix: Fix `Loading` positioning and `LemonButton` disabled state (#9554) test: Speed up backend tests (#9289) fix: LemonSpacer -> LemonDivider (#9549) feat(funnels): Highlight significant deviations in new funnel viz (#9536) docs(storybook): Lemon UI (#9426) feat: add support for list of teams to enable the conversion buffer for (#9542) chore(onboarding): cleanup framework grid experiment (#9527) fix(signup): domain provisioning on cloud (#9515) chore: split out async migrations ci (#9539) feat(ingestion): enable json ingestion for self-hosted by default (#9448) feat(cohort): add all cohort filter selectors to Storybook (#9492) feat(ingestion): conversion events buffer consumer (#9432) ci(run-backend-tests): remove CH version default (#9532) feat: Add person info to events (#9404) feat(ingestion): produce to buffer partitioned by team_id:distinct_id (#9518) fix: bring latest_migrations.manifest up to date (#9525) chore: removes unused feature flag (#9529) ...

yakkomajuri and others added 5 commits April 12, 2022 14:58

feature(ingestion): super wip initial buffer refactor

4a76a33

polish

df92a9d

Merge branch 'master' into events-buffer

5f4e04d

fix

5a7cce0

feat(ingestion): conversion events buffer consumer

b98fe94

yakkomajuri mentioned this pull request Apr 15, 2022

feature(ingestion): conversion events buffer consumer #9429

Closed

yakkomajuri and others added 12 commits April 15, 2022 15:51

fix accidental commit

19e7c5e

fix accidental commit

4a359ff

fix

ac7985e

fix

0168c5f

big fix

e47b3b3

fix

f8f6912

fix

4264f17

fix

1d72a97

Merge branch 'events-buffer' into events-buffer-consumer-2

3c05864

prettier

d7cd47f

fix tests

d5ad806

merge events-buffer

bff59d6

yakkomajuri requested a review from tiina303 April 18, 2022 16:52

fix & wip tests

ef8de23

tiina303 reviewed Apr 19, 2022

View reviewed changes

yakkomajuri and others added 2 commits April 19, 2022 16:50

address review

f619cb0

Merge branch 'master' into events-buffer

82df85a

yakkomajuri commented Apr 25, 2022

View reviewed changes

plugin-server/src/main/ingestion-queues/kafka-queue.ts Outdated Show resolved Hide resolved

Update plugin-server/src/main/ingestion-queues/kafka-queue.ts

dad5138

yakkomajuri commented Apr 25, 2022

View reviewed changes

plugin-server/src/main/ingestion-queues/kafka-queue.ts Show resolved Hide resolved

yakkomajuri and others added 2 commits April 25, 2022 12:48

Update plugin-server/src/main/ingestion-queues/kafka-queue.ts

c78512c

Co-authored-by: Tiina Turban <tiina303@gmail.com>

fix

557426b

Base automatically changed from events-buffer to master April 25, 2022 14:10

Merge branch 'events-buffer' into events-buffer-consumer-2

44512cc

review

d5f2afb

Merge branch 'master' into events-buffer-consumer-2

f529c5f

tiina303 approved these changes Apr 26, 2022

View reviewed changes

process messages in order

fc4222b

yakkomajuri merged commit 0a744d8 into master Apr 26, 2022

yakkomajuri deleted the events-buffer-consumer-2 branch April 26, 2022 12:44

yakkomajuri mentioned this pull request Apr 26, 2022

feat: add support for list of teams to enable the conversion buffer for #9542

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(ingestion): conversion events buffer consumer #9432

feat(ingestion): conversion events buffer consumer #9432

Uh oh!

yakkomajuri commented Apr 15, 2022 •

edited

Loading

Uh oh!

Uh oh!

tiina303 Apr 19, 2022

Uh oh!

yakkomajuri Apr 25, 2022

Uh oh!

tiina303 Apr 26, 2022

Uh oh!

yakkomajuri Apr 26, 2022

Uh oh!

yakkomajuri Apr 26, 2022

Uh oh!

Uh oh!

tiina303 Apr 19, 2022

Uh oh!

yakkomajuri Apr 25, 2022

Uh oh!

tiina303 Apr 26, 2022

Uh oh!

Uh oh!

Uh oh!

yakkomajuri commented Apr 25, 2022

Uh oh!

Uh oh!

feat(ingestion): conversion events buffer consumer #9432

feat(ingestion): conversion events buffer consumer #9432

Uh oh!

Conversation

yakkomajuri commented Apr 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

How did you test this code?

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yakkomajuri commented Apr 25, 2022

Uh oh!

Uh oh!

yakkomajuri commented Apr 15, 2022 •

edited

Loading