Bail out GoogleAsyncStreamImpl::writeQueued if cq is drained #4356

qiwzhang · 2018-09-06T01:24:05Z

Signed-off-by: Wayne Zhang qiwzhang@google.com

Description:

This change tries to fix #4352

Problem:
When GrpcSubscriptionImpl is deleted, it tries to send one more request update. Newly added sds_api is using GrpcSubscriptionImpl which is owned by a ListenerImpl. When server is killed, GoogleAsyncClientThreadLocal is removed before ListenerManager is deleted. It will cause sds_api to send a google grpc request with a deleted GoogleAsyncClientThreadLocal. Hence the crash.

Fix:
When GoogleAsyncClientThreadLocal is removed, all its active streams will be marked as draining_cq_ = true. If GoogleAsyncClientImpl::writeQueued is called with that flag, bail out.

Risk Level: Low

Testing:
bazel test --runs_per_test=1000 //test/integration:sds_dynamic_integration_test

Docs Changes: None

Release Notes: None

Signed-off-by: Wayne Zhang <qiwzhang@google.com>

qiwzhang · 2018-09-06T01:28:52Z

@htuch Please help to review this.

htuch · 2018-09-06T03:09:45Z

source/common/grpc/google_async_client_impl.cc

@@ -205,7 +205,8 @@ void GoogleAsyncStreamImpl::resetStream() {
 }

 void GoogleAsyncStreamImpl::writeQueued() {
-  if (!call_initialized_ || finish_pending_ || write_pending_ || write_pending_queue_.empty()) {
+  if (!call_initialized_ || finish_pending_ || write_pending_ || write_pending_queue_.empty() ||


This seems a legitimate change; we shouldn't be queueing new writes if we're draining for sure. Does this solve your teardown race for the Envoy gRPC client as well?

Trying to recall our earlier conversation, it seems we should get a clear teardown ordering. We should ensure that all the gRPC consumers are properly removed before we begin the TLS teardown.

Yes, it solves the test flakiness. tested with runs_per_test=1000.

We could do that teardown ordering in a separate PR. We need to get this in to unblock CI tests.

OK, I will merge given that you folks have ownership on the underlying issue. Thanks.

not to writeQueued if cq is drained

eca5bbc

Signed-off-by: Wayne Zhang <qiwzhang@google.com>

qiwzhang mentioned this pull request Sep 6, 2018

Not to send a discovery request at GrpcSubscriptionImpl destructor #4354

Closed

htuch self-assigned this Sep 6, 2018

htuch reviewed Sep 6, 2018

View reviewed changes

htuch approved these changes Sep 6, 2018

View reviewed changes

htuch merged commit c4211b3 into envoyproxy:master Sep 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bail out GoogleAsyncStreamImpl::writeQueued if cq is drained #4356

Bail out GoogleAsyncStreamImpl::writeQueued if cq is drained #4356

Uh oh!

qiwzhang commented Sep 6, 2018

Uh oh!

qiwzhang commented Sep 6, 2018

Uh oh!

htuch Sep 6, 2018

Uh oh!

qiwzhang Sep 6, 2018

Uh oh!

qiwzhang Sep 6, 2018

Uh oh!

htuch Sep 6, 2018

Uh oh!

Uh oh!

Bail out GoogleAsyncStreamImpl::writeQueued if cq is drained #4356

Bail out GoogleAsyncStreamImpl::writeQueued if cq is drained #4356

Uh oh!

Conversation

qiwzhang commented Sep 6, 2018

Uh oh!

qiwzhang commented Sep 6, 2018

Uh oh!

htuch Sep 6, 2018

Choose a reason for hiding this comment

Uh oh!

qiwzhang Sep 6, 2018

Choose a reason for hiding this comment

Uh oh!

qiwzhang Sep 6, 2018

Choose a reason for hiding this comment

Uh oh!

htuch Sep 6, 2018

Choose a reason for hiding this comment

Uh oh!

Uh oh!