Skip to content

Conversation

yakkomajuri
Copy link
Contributor

@yakkomajuri yakkomajuri commented Apr 8, 2022

Summary of the problem we have as posted on Slack:

ClickHouse crashes when parsing JSONEachRow records (or JSON formats in general) if it encounters a surrogate character without a pair.
In our case, some user’s events contained \ud83d\.

The approach I took here was to follow what was done here: fastify/fast-json-stringify#151

That means escaping all surrogates, rather than trying to look for surrogates without a pair. It's a valid tradeoff.

Once impletmented in JS I then made sure Python had the same functionality.

Has tests for both sides

@yakkomajuri yakkomajuri mentioned this pull request Apr 8, 2022
yakkomajuri and others added 2 commits April 11, 2022 08:26
Co-authored-by: Tiina Turban <tiina303@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants