-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
We'll be implementing a buffer using Kafka and the plugin server to ensure we associate events with the right distinct ID around the "identify edge", which we can also refer to as "conversion" events.
The solution will work as follows:
- Add a new Kafka topic for the buffer, called e.g.
events_buffer
, where messages contain event payloads as well as an extra fieldprocess_at
- When processing an event, run
processEvent
andonEvent
normally. However, at the very "end" of theingestEvent
task, make a decision to send an event to ClickHouse directly or to the buffer topic. The heuristic for this is as follows:
anonymous events -> clickhouse
$identify -> clickhouse
non-anonymous events without a person -> buffer
non-anonymous events within conversion window -> buffer
- Set up a new consumer on the main thread to consume from the buffer topic and send to a worker task doing the following:
- Look up person id
- Add id to event
- Produce event to Kafka topic consumed by CH
-
The consumer should work as follows:
- Pull a message from Kafka
- Check
process_at
. Fort = process_at - now
, ift > 0
, don't commit the offset, finish the execution, stop the consumer and sleep fort
. Ift <= 0
, ingest the event now
Won't waste a bunch of time making a graph that's a perfect representation of the world, but this should give a good overview of how this system will work:
This issue previously outlined a ClickHouse buffer solution that we've decided again. Click below to see its content.
Old issue content
The `staging_events` table will have the same schema as the events table.
Creating it could be done via a "normal" CH migration as it is a new table.
However, we want to only create the materialized view and Kafka table on one server. This is to ensure consistency when querying from this table to write to writable_events
.
For this we will need some assistance from Team Infra (@guidoiaquinti @hazzadous), as we need a way for self-hosted users to also leverage CLICKHOUSE_STABLE_HOST
. Effectively, we need a way to connect to one individual ClickHouse server for this.
Metadata
Metadata
Assignees
Type
Projects
Status