Skip to content

feat(session-recording): store session recording data outside of ClickHouse #9294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 95 commits into from

Conversation

pauldambra
Copy link
Member

@pauldambra pauldambra commented Mar 30, 2022

Problem

A high proportion of ClickHouse disk space is dedicated to session recordings. These disks are expensive and ClickHouse isn't needed to query the payload data that is using up the space

see #2142

Changes

recordings-no-clickhouse

  • adds Mini IO to expose S3like storage
  • changes the plugin server to write session recording data to disk, not to ClickHouse
  • reads the session recording data from disk when it isn't present in ClickHouse
  • checks for access to storage when starting the plugin server

How does it work?

sequenceDiagram
    actor web
    actor SDK
    actor app.posthog.com
    actor kafka
    actor plugin.server
    actor clickhouse
    actor object_storage
    SDK->>app.posthog.com: session_recording_event
    app.posthog.com->>kafka: chunked_session_recording_events
    kafka->>+plugin.server: 
    plugin.server->>clickhouse: stores chunks without data
    plugin.server->>-object_storage: stores chunk data
    web->>+app.posthog.com: load session data
    app.posthog.com->>clickhouse: load session json
    loop for every loaded chunk
        app.posthog.com->>object_storage: load chunk data for session json
    end
    app.posthog.com->>-web: return session data
Loading

How did you test this code?

running it locally, adding some tests

## consequences

storage becomes mandatory for self-host

Do we support both? Or make it super easy to use storage

We'll need storage for:

  • CSV export
  • image generation
  • potentially ClickHouse backups

Related PRs

@pauldambra pauldambra marked this pull request as draft March 30, 2022 15:13
@pauldambra pauldambra changed the title feature(session-recording): store session recording data outside of ClickHouse feat(session-recording): store session recording data outside of ClickHouse Mar 30, 2022
@pauldambra
Copy link
Member Author

@guidoiaquinti

I'm thinking the order of changes is:

  • merge this so it has no effect (helm unchanged, docker doesn't start object storage, code defaults to disabled)
  • turn it on on cloud (figuring out what documentation updates will be needed)
    • which is when we can start to save on disk space
  • make it opt-in for self-hosted
  • get friendly customers to test it
  • use that to update documentation
  • general release (still opt-in)

(what am I missing/getting wrong :))

@posthog-bot
Copy link
Contributor

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.

@posthog-bot
Copy link
Contributor

This PR was closed due to 2 weeks of inactivity. Feel free to reopen it if still relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants