Skip to content

fix(clickhouse): Add auto-retry logic on SSL reset error #3733

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 27, 2025

Conversation

vincent-pochet
Copy link
Collaborator

@vincent-pochet vincent-pochet commented May 27, 2025

Context

Some time, with Clickhouse queries, a Errno::ECONNRESET Connection reset by peer - SSL_connect is raised.
This happen when the Clickhouse cluster is under eavy load and is scaled automatically.

Description

Since these error are intermediary, this PR adds a auto-retry system (up to 3 retries) to reduce the number of errors in the following processes (billing, current usage, daily usage...)

NOTE: we should check how to address this at the Clickhouse connector level, but it is not currently possible to pass an option to enable max_retries on the underlying Net::HTTP client

@vincent-pochet vincent-pochet merged commit 4c6bf88 into main May 27, 2025
14 checks passed
@vincent-pochet vincent-pochet deleted the misc-clickhouse-retries branch May 27, 2025 08:51
vincent-pochet added a commit that referenced this pull request May 27, 2025
## Description
Similarly to #3733, this PR adds
auto-retry for `ActiveRecord::ActiveRecordError` on Clickhouse queries
vincent-pochet added a commit that referenced this pull request May 27, 2025
## Description

Following #3733 and
#3736, this PR adds a missing
auto-retry to the `ClickhouseStore#events` method
vincent-pochet added a commit that referenced this pull request Jun 20, 2025
## Context

This PR follows #3733 but we are
still seeing some
`Errno::ECONNRESET Connection reset by peer - SSL_connect`

## Description

It refactors a bit the way we handle retries as it appears that some
`events.to_sql` are also hiting the database.

NOTE: this is a quick fix, a future PR will had some logic to rewrite
theses queries in pure SQL to reduce this unintended queries that
generates some load on the clickhouse instance
diegocharles pushed a commit that referenced this pull request Jul 11, 2025
## Context

This PR follows #3733 but we are
still seeing some
`Errno::ECONNRESET Connection reset by peer - SSL_connect`

## Description

It refactors a bit the way we handle retries as it appears that some
`events.to_sql` are also hiting the database.

NOTE: this is a quick fix, a future PR will had some logic to rewrite
theses queries in pure SQL to reduce this unintended queries that
generates some load on the clickhouse instance
vincent-pochet added a commit that referenced this pull request Aug 19, 2025
## Context

This PR follows #3733

## Description

In some situation, a "NoMethodError: undefined method 'select_all' for
nil" is raised when running
`::Clickhouse::EventsEnriched.connection.select_all`. This is an issue
with the Active record connector for Clickhouse.

This PR just handle an automatic retry when the error is raised
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants