Skip to content

Conversation

tomasfarias
Copy link
Contributor

@tomasfarias tomasfarias commented Mar 2, 2023

Problem

closes #13776
So, this PR attempts to incorporate the changes from #14368 and #14397, but on top of #14477.

Yikes.

For some commits, I was able to cherry-pick them as they are. Some others, had too many conflicts with recent changes so they had to be manually re-done.

The solution in those two PRs attempted to use a CHECK constraint to enforce consistency while merging overrides. This was problematic as it required a FK on very hot tables (Person and Team), which made it very hard to the deploy.

This solution uses an exclusion constraint instead, but since exclusion constraints on uuid[] are not supported, we created a helper model in #14477 that maps a serial int id to person UUIDs. Exclusion constraints on int[] are possible with two extensions.

Changes

On top of the new helper model, this PR includes the merging logic. This logic has been modified from #14368 to utilize the PersonOverridesMapping model. This requires two more queries to insert into the helper model as we merge.

Moreover, this also includes a fix for the latest migration as when dropping a field a check constraint was dropped. So it has to be re-added.

How did you test this code?

Unit tests were mostly kept the same.

@tomasfarias tomasfarias force-pushed the feat/write-person-overrides-with-exclusion branch 2 times, most recently from 3522e10 to b5ca63f Compare March 2, 2023 17:45
@tomasfarias tomasfarias marked this pull request as ready for review March 2, 2023 18:40
@tomasfarias tomasfarias force-pushed the feat/person-overrides-constraints-with-helper branch from 68ca1bb to f0c1c9a Compare March 3, 2023 09:25
@tomasfarias tomasfarias force-pushed the feat/write-person-overrides-with-exclusion branch 3 times, most recently from 5dfee25 to 225dd7a Compare March 3, 2023 10:16
Copy link
Contributor

@tiina303 tiina303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which made it very hard to the deploy. => I'd add that going back to the drawing board we realized there's a more explicit way we can validate correctness.

Comment on lines +2102 to +2174
// For some reason these tests failed if I ran them with a hub shared
// with other tests, so I'm creating a new hub for each test.
let hub: Hub
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you retry to see if that's still the case?

uuid: person.uuid,
properties: person.properties,
created_at: person.created_at,
// TODO: fix this. It seems that we update some person
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind expanding on "^"?

This commit was also not added by me, but cherry picked from other PR (either yours or @hazzadous's).

So, if your "^" means "fix this what this comment says" then I'm going to need more context on that as I didn't write the comment myself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the context, but can look into it if needed, I'm not even sure it's true 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you uncomment the lines is_identified and version, you'll end up with:

    +   Object {
    -     "is_identified": false,
    +     "is_identified": true,
...
    -     "version": "0",
    +     "version": "1",
        },

So if we for some reason fail to completely process the event, on retry we will see is_identified === true and thus we will not properly handle the retry as there is code that depends on the comparison. It's like that already, nothing to do with this PR so fine to leave.

The main thing to test is that running it again gets us to the right state which I think we do in this case. However, I think there are cases where it will not be correct, e.g. if one of the persons was already identified, then the second becomes identified, then we can merge them afterwards on second run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah I get it now, this is deterministic, so we can test this properly. We always mark the distinct_id person as is_identified even if merge failed. This helps make sure that anything that's a user_id won't be merged into anything else via $identify nor $create_alias.

So the test can be:

distinct_id == 'first' person: is_identified: true, version: 1
distinct_id == 'second' person: is_identified: false, version: 0

Potentially easiest to change the code by adding a person property with the distinct_id values 'first' and 'second'

@tomasfarias tomasfarias force-pushed the feat/write-person-overrides-with-exclusion branch from 225dd7a to 09347cb Compare March 6, 2023 14:01
@tomasfarias tomasfarias changed the title feat: person-overrides writes but with helper feat: person-overrides writes but with Mapping to Persons Mar 6, 2023
@tomasfarias tomasfarias changed the title feat: person-overrides writes but with Mapping to Persons feat: person-overrides writes with Mapping to Persons Mar 6, 2023
Base automatically changed from feat/person-overrides-constraints-with-helper to master March 6, 2023 14:52
@tomasfarias tomasfarias force-pushed the feat/write-person-overrides-with-exclusion branch 10 times, most recently from f012ae7 to ce60990 Compare March 8, 2023 16:02
@tomasfarias tomasfarias force-pushed the feat/write-person-overrides-with-exclusion branch 2 times, most recently from 740ac16 to d743b19 Compare March 9, 2023 13:28
tomasfarias and others added 23 commits March 10, 2023 10:25
Test was originally skipped due to missing a $merge command, but now
$merge_dangerously is available, so the test should pass.
Co-authored-by: Harry Waye <harry@posthog.com>
Co-authored-by: Tiina Turban <tiina303@gmail.com>
@tomasfarias tomasfarias force-pushed the feat/write-person-overrides-with-exclusion branch from 0e47904 to 5243222 Compare March 10, 2023 09:26
Copy link
Contributor

@tiina303 tiina303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you just make the 1 change to run
plugin-server/tests/worker/ingestion/event-pipeline/processPersonsStep.test.ts
in both modes, other stuff we can do in follow-ups

uuid: person.uuid,
properties: person.properties,
created_at: person.created_at,
// TODO: fix this. It seems that we update some person
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah I get it now, this is deterministic, so we can test this properly. We always mark the distinct_id person as is_identified even if merge failed. This helps make sure that anything that's a user_id won't be merged into anything else via $identify nor $create_alias.

So the test can be:

distinct_id == 'first' person: is_identified: true, version: 1
distinct_id == 'second' person: is_identified: false, version: 0

Potentially easiest to change the code by adding a person property with the distinct_id values 'first' and 'second'

Comment on lines +2538 to +2540
// Due to usage of identify, the same distinct_id must be used as distinct_id, so
// which ever order merges happen we will always be able to merge and not be blocked
// due to the mergefrom user being already identified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could use $merge_dangerously here instead, but no need to change it.

uuid: third.uuid,
properties: {},
created_at: third.created_at,
// TODO: fix this. It seems that we update some person
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as the other comment - we can do a follow-up PR for both

@tomasfarias tomasfarias enabled auto-merge (squash) March 13, 2023 10:42
@tomasfarias tomasfarias disabled auto-merge March 13, 2023 10:45
@tomasfarias tomasfarias merged commit 315f47f into master Mar 13, 2023
@tomasfarias tomasfarias deleted the feat/write-person-overrides-with-exclusion branch March 13, 2023 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ingestion code changes to emit to kafka on merges for short and long term person overrides
3 participants