Skip to content

CuratorZookeeperClient were reset unexpectedly #1248

@xingsuo-zbz

Description

@xingsuo-zbz

I set the sessionTimeoutMs to 1d, but the actual effective value is 500654ms.

Testing Details

server conf:

tickTime=2000
initLimit=10
syncLimit=5
minSessionTimeout=7200000
maxSessionTimeout=86400000

curator client conf:

CuratorFrameworkFactory.builder()
    .connectString(zkQuorum)
    .sessionTimeoutMs(86400000)
    .connectionTimeoutMs(15000)
    .simulatedSessionExpirationPercent(100)
    .retryPolicy(new ExponentialBackoffRetry(5000, 24))
    .namespace("xxx")
    .aclProvider(aclProvider);

There are 3 zookeeper servers, kill 2 of them, simulate a long-term unavailability failure of zookeeper.

The curator client enters SUSPEND state after the leader is unavailable, and is expected to enter LOST state after 1 day, but in reality it will enter LOST state after about 8 minutes.

Related logs:
2025-02-21 18:55:12,181 [main-EventThread] DEBUG org.apache.flink.shaded.curator5.org.apache.curator.ConnectionState - Negotiated session timeout: 86400000
2025-02-21 19:03:33,443 [Curator-ConnectionStateManager-0] WARN org.apache.flink.shaded.curator5.org.apache.curator.framework.state.ConnectionStateManager - Session timeout has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 500654. Adjusted session timeout ms: 500654

Root cause

(useSessionTimeoutMs * sessionExpirationPercent) resulted in integer overflow, CuratorZookeeperClient were reset unexpectedly:

private int getUseSessionTimeoutMs() {
int lastNegotiatedSessionTimeoutMs = client.getZookeeperClient().getLastNegotiatedSessionTimeoutMs();
int useSessionTimeoutMs =
(lastNegotiatedSessionTimeoutMs > 0) ? lastNegotiatedSessionTimeoutMs : sessionTimeoutMs;
useSessionTimeoutMs = sessionExpirationPercent > 0 && startOfSuspendedEpoch != 0
? (useSessionTimeoutMs * sessionExpirationPercent) / 100
: useSessionTimeoutMs;
return useSessionTimeoutMs;
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions