Skip to content

State of "Protobuf" format schema for clickhouse #8334

@macobo

Description

@macobo

Is your feature request related to a problem?

In #2254 @fuziontech introduced protobuf for the clickhouse_events topic. Exact motivation if unknown by me, but I assume it has to do with cost savings around kafka traffic.

We never rolled out the feature to other topics and since then we've:

  • Introduced plugin-server
  • Introduced new topics that facilitate events service <-> plugin server communication (including for communicating events data)
  • Introduced dead letter queue and other topics
  • Are considering significantly changing the events table structure, which in turn requires updating the existing protobuf schema and test deploying these changes. This in turn might create potential posthog upgrade barriers.
  • Are considering supporting external clickhouse cloud providers (like altinity) which might not have support for using custom format schemas

All of this makes the protobuf solution we have right now less than ideal. It also doesn't really serve the potential cost-saving purpose as events are json-encoded in at least one topic already.

The discussion point here is - what shall we do about it?

Do we want to invest more and get protobuf on other topics? Do we want to make it entirely optional/configurable? Do we want to drop it completely?

cc @fuziontech @hazzadous @guidoiaquinti @tiina303 @yakkomajuri @marcushyett-ph for team-platform context

cc @EDsCODE and @timgl who might have additional context

Thank you for your feature request – we love each and every one!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestperformanceHas to do with performance. For PRs, runs the clickhouse query performance suiteteam/infraEverything related to deploying PostHog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions