-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat(demo): Rework demo data generation system #7889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If you're doing this, it's also valuable to perhaps set up some groups data :) |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
I suppose this is now reviewable. Guide below. How to testFirst run
ApproachWhat happens when a user enters the demo environment?Whenever a user signs up/logs in (they are the same here), they are:
How does the simulation work?Each individual simulation is a matrix (abstractly, a A The results of a TODOsThis system has MUCH more potential than the previous one, but it's not as robust as it could yet. Here are some enhancement opportunities:
|
# TODO: Support persons on events | ||
} | ||
p = ClickhouseProducer() | ||
p.produce(topic=KAFKA_EVENTS_JSON, sql=INSERT_EVENT_SQL(), data=data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
The only concern I'd have with this is that it could be slightly more annoying to debug, e.g. wrote a bad event, but I'll take the speed improvements + looks like we already use it for persons and other stuff too.
Noticed that in the bulk_create_events
we still have sync_execute
below that's probably fine and potentially faster that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -535,6 +535,7 @@ export const keyMapping: KeyMappingInterface = { | |||
}, | |||
}, | |||
} | |||
keyMapping['$distinct_id'] = keyMapping['distinct_id'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks 👁️ 👁️ dodgy. // what's happening
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this is actually wrong, I misread _get_distinct_id
and saw both distinct_id
and $distinct_id
being supported, but that's only at the top level of the object. In props indeed only the former is recognized. (Not the most straightforward situation, but I guess that's backwards compat for you)
Oh right, I didn't explicitly point that out, but the
|
So, for the actual demo experience, you need to log out and sign in with a different email. That should set you up with fresh data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. 🙃
Well, this looks good and works. However it takes over a minutes to generate the demo data for me:
The console shows output during the first few seconds, but then pretty much pauses for a while and the app seems to be stuck by all visible indicators. 80 seconds later the next log line appears:
[DEMO] Simulated 1058 people in 6.18 s
[DEMO] Saved (individual part) 1058 people in 80.83 s
Is there a plan to pregenerate this for users? As it's now, we can't really run this for every new users, unless we show a clear "please wait" screen with a game they can play when waiting.
Hmm, this should take a few seconds since the events are now ingested async via Kafka, but I'll look into it. 👀 There's a couple of approaches where this could be pre-generated, though I think it's more fun if everyone gets a random environment of their own – provided of course this takes like less than a 10 seconds, where the wait would be OK (we could just show an approximate "Preparing the world" progress bar). |
* Rework demo data generation system * Fix `setup_dev` and `posthog-foss` * Keep old demo data generators to reduce hassle * Move to Hoglify concept * Separate new generator from old version * Fix issues * Rework simulation structure * Restore package.json * Reformat `requirements` * Fix signup button margin * Refactor things * Remove snapshots * Strip old stuff * Rearrange more * Fix bad imports * Add simulation scaffolding * Add `dry_run_matrix` command * Fix determinism * Update naming * Update dry_run_matrix.py * Model web client, add sessions, enable full-cluster simulation * Update flake8 config * Ignore T001 violation * Fix saving data * Instrument `set_project_up` more * Add demo cohorts, feature flag, experiment * Parametrize `start` and `end` in `simulate_matrix` * Add neighbor effects * Add more events * Allow silencing events in `simulate_matrix` * Improve effect scheduling and add more activities * Fix time measurement * Disallow creating extra orgs for demo users * Add more useful info to `simulate_matrix` output * Add super properties, refine world * Fix first-seen moment * `create_event` to Kafka if possible for speed * Alias `$distinct_id` to `distinct_id` in `keyMapping` * Extend simulation to 120 days * Fix experiment instrumentation * Fix some error message * Fix experiment flag * Increase number of demo sim clusters * Fix typing * Remove unused agent actions * Support Python 3.8 * Avoid `Union[Team, int]` * Fix an arg * Remove dodgy alias
Changes
Follow-up to #7824. Aiming to resolve demo data concerns from PostHog/posthog.com#2661 (comment).
How did you test this code?
Alexa remind me