Skip to content

Conversation

yakkomajuri
Copy link
Contributor

Problem

#9752

Changes

Set kafka_skip_broken_messages on the DLQ table.

This should have went in a long time ago via a PR I opened a long time ago.

How did you test this code?

Ran the migration locally, checked the schema in ClickHouse

Copy link
Contributor

@tiina303 tiina303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 🙏

@fuziontech
Copy link
Member

Can we make the DLQ table not JSONEachRow? This would make things a lot easier and more robust. Basically just capture the string or bytes as a single column. We really shouldn't be trying to deserialize something that is in the DLQ because 9/10 times the reason why it will be in the DLQ will be a serialization issue.

This would be better than dropping the payload on the floor since one of the reasons to have a DLQ is for investigative purposes. If we just skip the message we lose any intel we can use to fix the problem in the future.

@yakkomajuri
Copy link
Contributor Author

@fuziontech fair point. what I do like about the current structure is that it makes it easy to re-ingest events if it's "our fault" (plugin server issue, Postgres down, etc). But yeah I see how it can be problematic.

I think we can keep columns we have some control over and can do basic validation on (e.g. team_id, uuid), and then keep the rest as an arbitrary payload. Completely getting rid of JSONEachRow would mean not having things like error message, location, and timestamp, which we have full control over and are valuable

@yakkomajuri
Copy link
Contributor Author

But yeah worth doing on a follow up

@yakkomajuri yakkomajuri merged commit b97b369 into master May 13, 2022
@yakkomajuri yakkomajuri deleted the dlq-settings branch May 13, 2022 10:46
fuziontech added a commit that referenced this pull request May 13, 2022
* master:
  fix: exclusion steps cannot be selected (#9762)
  feat(lemon-button): Support `status` for `primary` buttons (#9782)
  fix: healthcheck for kafka on plugin server (#9771)
  fix(billing): Update billing success message (#9739)
  fix(plugin-server): Set transpileOnly when importing piscina code in tests (#9777)
  fix(plugin-server): Remove unused kafka reset test code (#9779)
  fix: set kafka_skip_broken_messages on dead letter queue table (#9754)
  fix(plugin-server): remove dead code from worker.test.ts (#9776)
  refactor(plugin-server): Run ingestion only on worker threads (#9738)
  fix: Plugin-server tests with kafka need to have consumer stopped (#9774)
  chore(deps): Update posthog-js to 1.21.1 (#9773)
  chore(dep): upgrading rr-web (#9772)
  fix: ouroboros inputs (#9769)
alexkim205 pushed a commit that referenced this pull request May 23, 2022
* fix: set kafka_skip_broken_messages on dead letter queue table

* update more snapshots
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants