Skip to content

Potential deadlock in subchannel state update #2589

@Adarsh1994

Description

@Adarsh1994

What version of gRPC and what language are you using?

Version: 2.66.0
Language: C#

What operating system (Linux, Windows,...) and version?

Linux

What runtime / compiler are you using (e.g. .NET Core SDK version dotnet --info)

.NET SDK 8.0

What did you do?

We integrated gRPC .NET libraries to facilitate unary RPC calls between our services. While client-server communication generally works well with gRPC requests executing successfully, we occasionally encounter a potential deadlock. It looks like the issue arises when a subchannel state update occurs simultaneously after the completion of a request and a periodic DNS resolver attempting to update the same subchannel state.

What did you expect to see?

No deadlocks during the gRPC communication

What did you see instead?

Deadlock can be confirmed with the following stack traces of two stuck threads:

Thread (0x8F25):
Grpc.Net.Client!Grpc.Net.Client.Balancer.Subchannel.UpdateAddresses(class System.Collections.Generic.IReadOnlyList`1<class Grpc.Net.Client.Balancer.BalancerAddress>) Grpc.Net.Client!Grpc.Net.Client.Balancer.PickFirstBalancer.UpdateChannelState(class Grpc.Net.Client.Balancer.ChannelState) Grpc.Net.Client!Grpc.Net.Client.Balancer.Internal.ChildHandlerLoadBalancer.UpdateChannelState(class Grpc.Net.Client.Balancer.ChannelState) Grpc.Net.Client!Grpc.Net.Client.Balancer.Internal.ConnectionManager.OnResolverResult(class Grpc.Net.Client.Balancer.ResolverResult) Grpc.Net.Client!Grpc.Net.Client.Balancer.DnsResolver+<ResolveAsync>d__11.MoveNext() System.Private.CoreLib!System.Threading.ExecutionContext.RunInternal(class System.Threading.ExecutionContext,class System.Threading.ContextCallback,class System.Object) System.Private.CoreLib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[System.Threading.Tasks.VoidTaskResult,Grpc.Net.Client.Balancer.DnsResolver+<ResolveAsync>d__11].MoveNext(class System.Threading.Thread) System.Private.CoreLib!System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(class System.Runtime.CompilerServices.IAsyncStateMachineBox,bool) System.Private.CoreLib!System.Threading.Tasks.Task.RunContinuations(class System.Object) System.Private.CoreLib!System.Threading.Tasks.Task.FinishSlow(bool) System.Private.CoreLib!System.Threading.Tasks.Task.ExecuteWithThreadLocal(class System.Threading.Tasks.Task&,class System.Threading.Thread) System.Private.CoreLib!System.Threading.ThreadPoolWorkQueue.Dispatch() System.Private.CoreLib!System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()

Thread (0x8FE2):
Grpc.Net.Client!Grpc.Net.Client.Balancer.Internal.ConnectionManager.OnSubchannelStateChange(class Grpc.Net.Client.Balancer.Subchannel,value class Grpc.Core.ConnectivityState,value class Grpc.Core.Status) Grpc.Net.Client!Grpc.Net.Client.Balancer.Subchannel.UpdateConnectivityState(value class Grpc.Core.ConnectivityState,value class Grpc.Core.Status) Grpc.Net.Client!Grpc.Net.Client.Balancer.Subchannel.RequestConnection() Grpc.Net.Client!Grpc.Net.Client.Balancer.RequestConnectionPicker.Pick(class Grpc.Net.Client.Balancer.PickContext) Grpc.Net.Client!Grpc.Net.Client.Balancer.Internal.ConnectionManager+<PickAsync>d__46.MoveNext() System.Private.CoreLib!System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start(!!0&) Grpc.Net.Client!Grpc.Net.Client.Balancer.Internal.ConnectionManager.PickAsync(class Grpc.Net.Client.Balancer.PickContext,bool,value class System.Threading.CancellationToken) Grpc.Net.Client!Grpc.Net.Client.Balancer.Internal.BalancerHttpHandler+<SendAsync>d__11.MoveNext() System.Private.CoreLib!System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start(!!0&) Grpc.Net.Client!Grpc.Net.Client.Internal.GrpcCall`2+<RunCall>d__82[System.__Canon,System.__Canon].MoveNext() System.Private.CoreLib!System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start(!!0&) Grpc.Net.Client!Grpc.Net.Client.Internal.GrpcCall`2[System.__Canon,System.__Canon].RunCall(class System.Net.Http.HttpRequestMessage,value class System.Nullable`1<value class System.TimeSpan>) Grpc.Net.Client!Grpc.Net.Client.Internal.GrpcCall`2[System.__Canon,System.__Canon].StartUnaryCore(class System.Net.Http.HttpContent) Grpc.Net.Client!Grpc.Net.Client.Internal.HttpClientCallInvoker.AsyncUnaryCall(class Grpc.Core.Method`2<!!0,!!1>,class System.String,value class Grpc.Core.CallOptions,!!0) Grpc.Core.Api!Grpc.Core.Interceptors.InterceptingCallInvoker.<AsyncUnaryCall>b__4_0(!!0,value class Grpc.Core.Interceptors.ClientInterceptorContext`2<!!0,!!1>) Grpc.Core.Api!Grpc.Core.Interceptors.InterceptingCallInvoker.AsyncUnaryCall(class Grpc.Core.Method`2<!!0,!!1>,class System.String,value class Grpc.Core.CallOptions,!!0) CustomRouterService.BufferingTcsClient+<PublishAsync>d__19.MoveNext() System.Private.CoreLib!System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(class System.Threading.Thread,class System.Threading.ExecutionContext,class System.Threading.ContextCallback,class System.Object)

Following stack trace analysis captured:
StackTrace

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions