Skip to content

Conversation

mpuncel
Copy link
Contributor

@mpuncel mpuncel commented Jul 11, 2019

This commit fixes a crash in which an upstream request would not be
properly reset after the router was done with it and the callbacks
destroyed which would cause a segfault when data was received on the
upstream request. This happened in particular when: 1) "bad" headers are
received for an upstream request 2) that request is not retried, and 3) there are still requests in flight.
When hedging is disabled (i.e. default behavior) this is not an issue
because the filter will be destroyed synchronously which will reset the
stream, however with hedging it needs to be done more proactively.

Signed-off-by: Michael Puncel mpuncel@squareup.com

Description: Fix possible crash when request hedging is enabled
Risk Level: low
Testing: unit test, deployed in production and saw crashes stop
Docs Changes: n/a
Release Notes: n/a

This commit fixes a crash in which an upstream request would not be
properly reset after the router was done with it and the callbacks
destroyed which would cause a segfault when data was received on the
upstream request. This happened in particular when "bad" headers are
received for an upstream request but there are still requests in flight.
When hedging is disabled (i.e. default behavior) this is not an issue
because the filter will be destroyed synchronously which will reset the
stream, however with hedging it needs to be done more proactively.

Signed-off-by: Michael Puncel <mpuncel@squareup.com>
@alyssawilk
Copy link
Contributor

alyssawilk commented Jul 11, 2019

Throwing to Matt, less because I can't do the review myself and more because I'm unsure if we want to land this before or after release (#7538) :-)

Given it sounds like it works for the non-hedging case, I'm inclined to land after the release, even if it means hedging isn't as usable in 1.11. Thoughts?

@mattklein123
Copy link
Member

Yeah let's hold for 1.12. I will take a look.

@mpuncel
Copy link
Contributor Author

mpuncel commented Jul 11, 2019

Confirming there's no behavior change here if you don't have hedging enabled.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM with small question.

/wait-any

@@ -1052,7 +1052,7 @@ void Filter::onUpstreamHeaders(uint64_t response_code, Http::HeaderMapPtr&& head
// chance to return before returning a response downstream.
if (could_not_retry && (numRequestsAwaitingHeaders() > 0 || pending_retries_ > 0)) {
upstream_request.upstream_host_->stats().rq_error_.inc();
upstream_request.removeFromList(upstream_requests_);
upstream_request.removeFromList(upstream_requests_)->resetStream();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry QQ in the non-hedging case, how did this work previously? Where were things reset, and how do we avoid resetting things twice now?

Copy link
Contributor Author

@mpuncel mpuncel Jul 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the non-hedging case you'd continue to read from this request until end_stream and return any body/trailers downstream. When hedging is happening and there are other requests to wait around for, we want to cancel.

It won't be reset twice because we're removing it from the upstream request list on this line, so it won't be seen in resetAll() in Filter::onDestroy().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK can you maybe just add as mall comment about this? Makes sense.

/wait

Signed-off-by: Michael Puncel <mpuncel@squareup.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@mattklein123 mattklein123 merged commit feaada3 into envoyproxy:master Jul 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants