-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Bug Report
Reproduction Conditions
The deadlock occurs when attempting to connect to a Redis cluster when an exception is thrown from the SocketAddressResolver
.
I'm using Spring Data, but this error is reproducible without using it as well.
Environment
- Lettuce version(s): 6.5.5.RELEASE
- Redis version: n/a (does not connect, nor needs to)
Logger Warning Stack Trace
The error that is silently caught but not properly handled:
java.lang.IllegalArgumentException: Cannot parse port number: $(INVALID_DATA):CONFIG
at io.lettuce.core.internal.HostAndPort.parse(HostAndPort.java:98)
at io.lettuce.core.internal.HostAndPort.of(HostAndPort.java:56)
at io.lettuce.core.resource.MappingSocketAddressResolver.resolve(MappingSocketAddressResolver.java:97)
at io.lettuce.core.cluster.topology.DefaultClusterTopologyRefresh.openConnections(DefaultClusterTopologyRefresh.java:312)
at io.lettuce.core.cluster.topology.DefaultClusterTopologyRefresh.loadViews(DefaultClusterTopologyRefresh.java:99)
at io.lettuce.core.cluster.RedisClusterClient.fetchPartitions(RedisClusterClient.java:1033)
at io.lettuce.core.cluster.RedisClusterClient.loadPartitionsAsync(RedisClusterClient.java:985)
at io.lettuce.core.cluster.RedisClusterClient.initializePartitions(RedisClusterClient.java:940)
at io.lettuce.core.cluster.RedisClusterClient.getPartitions(RedisClusterClient.java:332)
at org.springframework.data.redis.connection.lettuce.ClusterConnectionProvider.getConnectionAsync(ClusterConnectionProvider.java:100)
at org.springframework.data.redis.connection.lettuce.ClusterConnectionProvider.getConnectionAsync(ClusterConnectionProvider.java:44)
at org.springframework.data.redis.connection.lettuce.LettuceConnectionProvider.getConnection(LettuceConnectionProvider.java:53)
at org.springframework.data.redis.connection.lettuce.LettucePoolingConnectionProvider.lambda$getConnection$0(LettucePoolingConnectionProvider.java:93)
at io.lettuce.core.support.ConnectionPoolSupport$RedisPooledObjectFactory.create(ConnectionPoolSupport.java:211)
at io.lettuce.core.support.ConnectionPoolSupport$RedisPooledObjectFactory.create(ConnectionPoolSupport.java:201)
at org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:71)
at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:566)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:306)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:233)
at io.lettuce.core.support.ConnectionPoolSupport$1.borrowObject(ConnectionPoolSupport.java:122)
at io.lettuce.core.support.ConnectionPoolSupport$1.borrowObject(ConnectionPoolSupport.java:117)
at org.springframework.data.redis.connection.lettuce.LettucePoolingConnectionProvider.getConnection(LettucePoolingConnectionProvider.java:99)
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$ExceptionTranslatingConnectionProvider.getConnection(LettuceConnectionFactory.java:1724)
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.getNativeConnection(LettuceConnectionFactory.java:1528)
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.lambda$getConnection$0(LettuceConnectionFactory.java:1508)
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.doInLock(LettuceConnectionFactory.java:1469)
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.getConnection(LettuceConnectionFactory.java:1505)
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getSharedClusterConnection(LettuceConnectionFactory.java:1205)
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getClusterConnection(LettuceConnectionFactory.java:1016)
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getConnection(LettuceConnectionFactory.java:994)
at org.springframework.data.redis.core.RedisConnectionUtils.fetchConnection(RedisConnectionUtils.java:195)
at org.springframework.data.redis.core.RedisConnectionUtils.doGetConnection(RedisConnectionUtils.java:144)
at org.springframework.data.redis.core.RedisConnectionUtils.getConnection(RedisConnectionUtils.java:105)
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:383)
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:363)
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:350)
at com.my.corporate.service.package.data.MyCorporateClass.init(MyCorporateClass.java:75)
Thread Dump Stack Trace
"main" #1 prio=5 os_prio=0 cpu=11312,57ms elapsed=59,27s tid=0x0000716cd00368f0 nid=0xc3d5f waiting on condition [0x0000716cd750e000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@17.0.14/Native Method)
- parking to wait for <0x000000061caf4d58> (a java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.park(java.base@17.0.14/LockSupport.java:211)
at java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.14/CompletableFuture.java:1864)
at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.14/ForkJoinPool.java:3476)
at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.14/ForkJoinPool.java:3447)
at java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.14/CompletableFuture.java:1898)
at java.util.concurrent.CompletableFuture.get(java.base@17.0.14/CompletableFuture.java:2072)
at io.lettuce.core.cluster.RedisClusterClient.get(RedisClusterClient.java:961)
at io.lettuce.core.cluster.RedisClusterClient.getPartitions(RedisClusterClient.java:332)
at org.springframework.data.redis.connection.lettuce.ClusterConnectionProvider.getConnectionAsync(ClusterConnectionProvider.java:100)
at org.springframework.data.redis.connection.lettuce.ClusterConnectionProvider.getConnectionAsync(ClusterConnectionProvider.java:44)
at org.springframework.data.redis.connection.lettuce.LettuceConnectionProvider.getConnection(LettuceConnectionProvider.java:53)
at org.springframework.data.redis.connection.lettuce.LettucePoolingConnectionProvider.lambda$getConnection$0(LettucePoolingConnectionProvider.java:93)
at org.springframework.data.redis.connection.lettuce.LettucePoolingConnectionProvider$$Lambda$1849/0x0000716c4cd71300.get(Unknown Source)
at io.lettuce.core.support.ConnectionPoolSupport$RedisPooledObjectFactory.create(ConnectionPoolSupport.java:211)
at io.lettuce.core.support.ConnectionPoolSupport$RedisPooledObjectFactory.create(ConnectionPoolSupport.java:201)
at org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:71)
at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:566)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:306)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:233)
at io.lettuce.core.support.ConnectionPoolSupport$1.borrowObject(ConnectionPoolSupport.java:122)
at io.lettuce.core.support.ConnectionPoolSupport$1.borrowObject(ConnectionPoolSupport.java:117)
at org.springframework.data.redis.connection.lettuce.LettucePoolingConnectionProvider.getConnection(LettucePoolingConnectionProvider.java:99)
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$ExceptionTranslatingConnectionProvider.getConnection(LettuceConnectionFactory.java:1724)
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.getNativeConnection(LettuceConnectionFactory.java:1528)
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.lambda$getConnection$0(LettuceConnectionFactory.java:1508)
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection$$Lambda$1847/0x0000716c4cd70e80.get(Unknown Source)
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.doInLock(LettuceConnectionFactory.java:1469)
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.getConnection(LettuceConnectionFactory.java:1505)
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getSharedClusterConnection(LettuceConnectionFactory.java:1205)
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getClusterConnection(LettuceConnectionFactory.java:1016)
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getConnection(LettuceConnectionFactory.java:994)
at org.springframework.data.redis.core.RedisConnectionUtils.fetchConnection(RedisConnectionUtils.java:195)
at org.springframework.data.redis.core.RedisConnectionUtils.doGetConnection(RedisConnectionUtils.java:144)
at org.springframework.data.redis.core.RedisConnectionUtils.getConnection(RedisConnectionUtils.java:105)
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:383)
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:363)
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:350)
at com.my.corporate.service.package.data.MyCorporateClass.init(MyCorporateClass.java:75)
Reproduction
I've created a repository here with the repro of this bug: https://github.com/henry701/lettuce-bug-report-infiwait
Pretty straightforward, just mvn compile exec:java
and the application will compile and hang.
Investigation
After analyzing the code execution flow together with the stack trace and a thread dump, the following was determined:
- The application gets stuck indefinitely in
getPartitions()
which is waiting for a promise returned byinitializePartitions()
- The Future returned by
loadPartitionsAsync()
is never completed - This future depends on another future returned in
loadViews()
, which is also never completed - The
loadViews()
future depends on a future from theConnectionTracker
class passed to theopenConnections()
method - Root cause: In the error handling path, there's a logger warning that wraps everything, but it doesn't call
tracker.addConnection(redisURI, sync)
when errors occur in certain flows - Specifically, the error happens at
SocketAddress socketAddress = clientResources.socketAddressResolver().resolve(redisURI)
when calling a custom implementation ofSocketAddressResolver
, which fails due to an invalid port format configuration in my case.
The issue lies in the openConnections
method, where exceptions during address resolution are caught and logged, but the sync
CompletableFuture
is not properly completed or added to the tracker in this error path.
Possible Solution
Move the declaration of CompletableFuture<StatefulRedisConnection<String, String>> sync = new CompletableFuture<>()
to the top of the loop in the openConnections
method, and modify the catch block to contain:
catch (RuntimeException e) {
String message = String.format("Unable to connect to [%s]", redisURI);
logger.warn(message, e);
sync.completeExceptionally(new RedisConnectionException(message, e));
tracker.addConnection(redisURI, sync);
}
This ensures that even when there's a connection error, the future is properly completed exceptionally and added to the ConnectionTracker
, completing its future exceptionally and preventing the deadlock.
Additional Context
This bug causes applications to hang indefinitely when attempting to connect to a Redis cluster with invalid node configuration, which can occur in various scenarios such as:
- Misconfigured environment variables
- Incorrect template substitution in configuration files
- Network or DNS resolution issues
The thread is stuck parking forever unless interrupted. As a result when ran from the main thread or from a thread which is waited upon such as a web thread, the application becomes unresponsive and requires continuous restarts to recover.