Skip to content

Redis Cluster Client Deadlock with custom SocketAddressResolver #3240

@henry701

Description

@henry701

Bug Report

Reproduction Conditions

The deadlock occurs when attempting to connect to a Redis cluster when an exception is thrown from the SocketAddressResolver.

I'm using Spring Data, but this error is reproducible without using it as well.

Environment

  • Lettuce version(s): 6.5.5.RELEASE
  • Redis version: n/a (does not connect, nor needs to)

Logger Warning Stack Trace

The error that is silently caught but not properly handled:

java.lang.IllegalArgumentException: Cannot parse port number: $(INVALID_DATA):CONFIG
  at io.lettuce.core.internal.HostAndPort.parse(HostAndPort.java:98)
  at io.lettuce.core.internal.HostAndPort.of(HostAndPort.java:56)
  at io.lettuce.core.resource.MappingSocketAddressResolver.resolve(MappingSocketAddressResolver.java:97)
  at io.lettuce.core.cluster.topology.DefaultClusterTopologyRefresh.openConnections(DefaultClusterTopologyRefresh.java:312)
  at io.lettuce.core.cluster.topology.DefaultClusterTopologyRefresh.loadViews(DefaultClusterTopologyRefresh.java:99)
  at io.lettuce.core.cluster.RedisClusterClient.fetchPartitions(RedisClusterClient.java:1033)
  at io.lettuce.core.cluster.RedisClusterClient.loadPartitionsAsync(RedisClusterClient.java:985)
  at io.lettuce.core.cluster.RedisClusterClient.initializePartitions(RedisClusterClient.java:940)
  at io.lettuce.core.cluster.RedisClusterClient.getPartitions(RedisClusterClient.java:332)
  at org.springframework.data.redis.connection.lettuce.ClusterConnectionProvider.getConnectionAsync(ClusterConnectionProvider.java:100)
  at org.springframework.data.redis.connection.lettuce.ClusterConnectionProvider.getConnectionAsync(ClusterConnectionProvider.java:44)
  at org.springframework.data.redis.connection.lettuce.LettuceConnectionProvider.getConnection(LettuceConnectionProvider.java:53)
  at org.springframework.data.redis.connection.lettuce.LettucePoolingConnectionProvider.lambda$getConnection$0(LettucePoolingConnectionProvider.java:93)
  at io.lettuce.core.support.ConnectionPoolSupport$RedisPooledObjectFactory.create(ConnectionPoolSupport.java:211)
  at io.lettuce.core.support.ConnectionPoolSupport$RedisPooledObjectFactory.create(ConnectionPoolSupport.java:201)
  at org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:71)
  at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:566)
  at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:306)
  at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:233)
  at io.lettuce.core.support.ConnectionPoolSupport$1.borrowObject(ConnectionPoolSupport.java:122)
  at io.lettuce.core.support.ConnectionPoolSupport$1.borrowObject(ConnectionPoolSupport.java:117)
  at org.springframework.data.redis.connection.lettuce.LettucePoolingConnectionProvider.getConnection(LettucePoolingConnectionProvider.java:99)
  at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$ExceptionTranslatingConnectionProvider.getConnection(LettuceConnectionFactory.java:1724)
  at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.getNativeConnection(LettuceConnectionFactory.java:1528)
  at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.lambda$getConnection$0(LettuceConnectionFactory.java:1508)
  at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.doInLock(LettuceConnectionFactory.java:1469)
  at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.getConnection(LettuceConnectionFactory.java:1505)
  at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getSharedClusterConnection(LettuceConnectionFactory.java:1205)
  at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getClusterConnection(LettuceConnectionFactory.java:1016)
  at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getConnection(LettuceConnectionFactory.java:994)
  at org.springframework.data.redis.core.RedisConnectionUtils.fetchConnection(RedisConnectionUtils.java:195)
  at org.springframework.data.redis.core.RedisConnectionUtils.doGetConnection(RedisConnectionUtils.java:144)
  at org.springframework.data.redis.core.RedisConnectionUtils.getConnection(RedisConnectionUtils.java:105)
  at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:383)
  at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:363)
  at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:350)
  at com.my.corporate.service.package.data.MyCorporateClass.init(MyCorporateClass.java:75)

Thread Dump Stack Trace

"main" #1 prio=5 os_prio=0 cpu=11312,57ms elapsed=59,27s tid=0x0000716cd00368f0 nid=0xc3d5f waiting on condition  [0x0000716cd750e000]
   java.lang.Thread.State: WAITING (parking)
  at jdk.internal.misc.Unsafe.park(java.base@17.0.14/Native Method)
  - parking to wait for  <0x000000061caf4d58> (a java.util.concurrent.CompletableFuture$Signaller)
  at java.util.concurrent.locks.LockSupport.park(java.base@17.0.14/LockSupport.java:211)
  at java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.14/CompletableFuture.java:1864)
  at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.14/ForkJoinPool.java:3476)
  at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.14/ForkJoinPool.java:3447)
  at java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.14/CompletableFuture.java:1898)
  at java.util.concurrent.CompletableFuture.get(java.base@17.0.14/CompletableFuture.java:2072)
  at io.lettuce.core.cluster.RedisClusterClient.get(RedisClusterClient.java:961)
  at io.lettuce.core.cluster.RedisClusterClient.getPartitions(RedisClusterClient.java:332)
  at org.springframework.data.redis.connection.lettuce.ClusterConnectionProvider.getConnectionAsync(ClusterConnectionProvider.java:100)
  at org.springframework.data.redis.connection.lettuce.ClusterConnectionProvider.getConnectionAsync(ClusterConnectionProvider.java:44)
  at org.springframework.data.redis.connection.lettuce.LettuceConnectionProvider.getConnection(LettuceConnectionProvider.java:53)
  at org.springframework.data.redis.connection.lettuce.LettucePoolingConnectionProvider.lambda$getConnection$0(LettucePoolingConnectionProvider.java:93)
  at org.springframework.data.redis.connection.lettuce.LettucePoolingConnectionProvider$$Lambda$1849/0x0000716c4cd71300.get(Unknown Source)
  at io.lettuce.core.support.ConnectionPoolSupport$RedisPooledObjectFactory.create(ConnectionPoolSupport.java:211)
  at io.lettuce.core.support.ConnectionPoolSupport$RedisPooledObjectFactory.create(ConnectionPoolSupport.java:201)
  at org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:71)
  at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:566)
  at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:306)
  at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:233)
  at io.lettuce.core.support.ConnectionPoolSupport$1.borrowObject(ConnectionPoolSupport.java:122)
  at io.lettuce.core.support.ConnectionPoolSupport$1.borrowObject(ConnectionPoolSupport.java:117)
  at org.springframework.data.redis.connection.lettuce.LettucePoolingConnectionProvider.getConnection(LettucePoolingConnectionProvider.java:99)
  at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$ExceptionTranslatingConnectionProvider.getConnection(LettuceConnectionFactory.java:1724)
  at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.getNativeConnection(LettuceConnectionFactory.java:1528)
  at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.lambda$getConnection$0(LettuceConnectionFactory.java:1508)
  at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection$$Lambda$1847/0x0000716c4cd70e80.get(Unknown Source)
  at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.doInLock(LettuceConnectionFactory.java:1469)
  at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.getConnection(LettuceConnectionFactory.java:1505)
  at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getSharedClusterConnection(LettuceConnectionFactory.java:1205)
  at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getClusterConnection(LettuceConnectionFactory.java:1016)
  at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getConnection(LettuceConnectionFactory.java:994)
  at org.springframework.data.redis.core.RedisConnectionUtils.fetchConnection(RedisConnectionUtils.java:195)
  at org.springframework.data.redis.core.RedisConnectionUtils.doGetConnection(RedisConnectionUtils.java:144)
  at org.springframework.data.redis.core.RedisConnectionUtils.getConnection(RedisConnectionUtils.java:105)
  at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:383)
  at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:363)
  at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:350)
  at com.my.corporate.service.package.data.MyCorporateClass.init(MyCorporateClass.java:75)

Reproduction

I've created a repository here with the repro of this bug: https://github.com/henry701/lettuce-bug-report-infiwait

Pretty straightforward, just mvn compile exec:java and the application will compile and hang.

Investigation

After analyzing the code execution flow together with the stack trace and a thread dump, the following was determined:

  1. The application gets stuck indefinitely in getPartitions() which is waiting for a promise returned by initializePartitions()
  2. The Future returned by loadPartitionsAsync() is never completed
  3. This future depends on another future returned in loadViews(), which is also never completed
  4. The loadViews() future depends on a future from the ConnectionTracker class passed to the openConnections() method
  5. Root cause: In the error handling path, there's a logger warning that wraps everything, but it doesn't call tracker.addConnection(redisURI, sync) when errors occur in certain flows
  6. Specifically, the error happens at SocketAddress socketAddress = clientResources.socketAddressResolver().resolve(redisURI) when calling a custom implementation of SocketAddressResolver, which fails due to an invalid port format configuration in my case.

The issue lies in the openConnections method, where exceptions during address resolution are caught and logged, but the sync CompletableFuture is not properly completed or added to the tracker in this error path.

Possible Solution

Move the declaration of CompletableFuture<StatefulRedisConnection<String, String>> sync = new CompletableFuture<>() to the top of the loop in the openConnections method, and modify the catch block to contain:

catch (RuntimeException e) {
    String message = String.format("Unable to connect to [%s]", redisURI);
    logger.warn(message, e);
    sync.completeExceptionally(new RedisConnectionException(message, e));
    tracker.addConnection(redisURI, sync);
}

This ensures that even when there's a connection error, the future is properly completed exceptionally and added to the ConnectionTracker, completing its future exceptionally and preventing the deadlock.

Additional Context

This bug causes applications to hang indefinitely when attempting to connect to a Redis cluster with invalid node configuration, which can occur in various scenarios such as:

  • Misconfigured environment variables
  • Incorrect template substitution in configuration files
  • Network or DNS resolution issues

The thread is stuck parking forever unless interrupted. As a result when ran from the main thread or from a thread which is waited upon such as a web thread, the application becomes unresponsive and requires continuous restarts to recover.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions