Skip to content

feat(ingestion): conversion buffer producer #9427

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 25, 2022
Merged

feat(ingestion): conversion buffer producer #9427

merged 11 commits into from
Apr 25, 2022

Conversation

yakkomajuri
Copy link
Contributor

Problem

#9182

Changes

This should be a no-op change for now given the buffer is disabled.

It splits out the event creation from the processEvent path in the worker so that we can selectively create an event or not. If the event shouldn't be immediately created, it will be produced to the buffer, which will in turn trigger ingestion at a later point.

How did you test this code?

Added tests

@yakkomajuri yakkomajuri changed the title feature(ingestion): conversion buffer producer feat(ingestion): conversion buffer producer Apr 15, 2022
@yakkomajuri yakkomajuri requested a review from tiina303 April 18, 2022 16:52
distinctId: string
properties: Properties
timestamp: DateTime | string
elementsList: Element[]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this and why do we need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is used in a lot of places.

before we used to pass all these as args to createEvent. Now however we need to return these, send these to the buffer, etc. I could check other types though and see if there's significant overlap to reuse

await hub.hookCannon.findAndFireHooks(event, person, siteUrl, actionMatches)

// eventId is undefined for CH deployments
// CH deployments calculate actions on the fly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have non-CH deployments anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment already existed but sure can remove

return actionMatches
}

// TODO: Handle new persons?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the rules for the buffer are specified here:

#9182

One of the rules is that events for persons that didn't exist before should go to the buffer. However, while building this, I realized:

  1. It'd be a major refactor to cover that
  2. The conversion seconds anyway handles it (person created within 60sec)

The one edge case is if we lower that buffer value a lot (say to 1sec), but actually I've decided to punt on this problem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented on that issue, sorry I missed it somehow before

const isAnonymousEvent =
event.properties && event.properties['$device_id'] && event.distinctId === event.properties['$device_id']
const isRecentPerson = !person || DateTime.now().diff(person.created_at).seconds < hub.BUFFER_CONVERSION_SECONDS
const ingestEventDirectly = isAnonymousEvent || event.event === '$identify' || !isRecentPerson
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to send anonymous events to the buffer & ingest non-anonymouse events directly?
Works for first sign-up and good for later pre-login events too?

We probably want to treat alias the same way as identify for when it's ingested.

Copy link
Contributor Author

@yakkomajuri yakkomajuri Apr 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We thought about the buffer a lot before making decisions on the rules.

But anyway, no, that doesn't make sense. We will merge on the anonymous person ID, so the ID for those events will never change. The buffer is used for events where the person ID might change.

The reason for sending anonymous events there would be if we merged into the identified person ID, but under that the whole system crumbles. How long does it take for an anonymous user to become identified? Is there a guarantee that they will?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants