-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Fix crash in request hedging handling. #7530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix crash in request hedging handling. #7530
Conversation
This commit fixes a crash in which an upstream request would not be properly reset after the router was done with it and the callbacks destroyed which would cause a segfault when data was received on the upstream request. This happened in particular when "bad" headers are received for an upstream request but there are still requests in flight. When hedging is disabled (i.e. default behavior) this is not an issue because the filter will be destroyed synchronously which will reset the stream, however with hedging it needs to be done more proactively. Signed-off-by: Michael Puncel <mpuncel@squareup.com>
Throwing to Matt, less because I can't do the review myself and more because I'm unsure if we want to land this before or after release (#7538) :-) Given it sounds like it works for the non-hedging case, I'm inclined to land after the release, even if it means hedging isn't as usable in 1.11. Thoughts? |
Yeah let's hold for 1.12. I will take a look. |
Confirming there's no behavior change here if you don't have hedging enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM with small question.
/wait-any
@@ -1052,7 +1052,7 @@ void Filter::onUpstreamHeaders(uint64_t response_code, Http::HeaderMapPtr&& head | |||
// chance to return before returning a response downstream. | |||
if (could_not_retry && (numRequestsAwaitingHeaders() > 0 || pending_retries_ > 0)) { | |||
upstream_request.upstream_host_->stats().rq_error_.inc(); | |||
upstream_request.removeFromList(upstream_requests_); | |||
upstream_request.removeFromList(upstream_requests_)->resetStream(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry QQ in the non-hedging case, how did this work previously? Where were things reset, and how do we avoid resetting things twice now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the non-hedging case you'd continue to read from this request until end_stream
and return any body/trailers downstream. When hedging is happening and there are other requests to wait around for, we want to cancel.
It won't be reset twice because we're removing it from the upstream request list on this line, so it won't be seen in resetAll()
in Filter::onDestroy()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK can you maybe just add as mall comment about this? Makes sense.
/wait
Signed-off-by: Michael Puncel <mpuncel@squareup.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
This commit fixes a crash in which an upstream request would not be
properly reset after the router was done with it and the callbacks
destroyed which would cause a segfault when data was received on the
upstream request. This happened in particular when: 1) "bad" headers are
received for an upstream request 2) that request is not retried, and 3) there are still requests in flight.
When hedging is disabled (i.e. default behavior) this is not an issue
because the filter will be destroyed synchronously which will reset the
stream, however with hedging it needs to be done more proactively.
Signed-off-by: Michael Puncel mpuncel@squareup.com
Description: Fix possible crash when request hedging is enabled
Risk Level: low
Testing: unit test, deployed in production and saw crashes stop
Docs Changes: n/a
Release Notes: n/a