-
Notifications
You must be signed in to change notification settings - Fork 5.1k
http1: Allocate encoder on heap to survive move #10561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http1: Allocate encoder on heap to survive move #10561
Conversation
Signed-off-by: Ruslan Nigmatullin <elessar@dropbox.com>
Signed-off-by: Ruslan Nigmatullin <elessar@dropbox.com>
Hmm. This is scary. What is the crashing call stack exactly? I don't see encoder would be used after this call. I would like to understand the underlying issue better before we decide how to fix as it would be optimal to avoid the extra heap allocation if we don't need it. /wait-any |
Sure, here is a traceback of the segmentation fault at e9248e2:
|
What happens under the hood is that If heap allocation is costly in this scenario we can move |
The comment makes me wonder if simply moving reset to the end of block does not cause any other side effects: envoy/source/common/http/http1/codec_impl.cc Lines 964 to 966 in 986e941
@mattklein123 any thoughts? |
OK this makes sense thanks. I'm really surprised/scared that this works in clang and especially doesn't get caught by ASAN. I think clang probably does not actually clear the memory when it "moves" (copies) the value into the local before resetting the optional (or the clang does not clear the memory in optional reset and gcc does).
I don't think this the issue. The issue is here: envoy/source/common/http/codec_client.cc Lines 107 to 110 in 446c7fd
We are using the encoder again after it was moved/reset. This is definitely broken though again it's scary that this seems to work OK in clang.
I think it's probably not safe to do this, since the intention of this code (IIRC) is to prevent a recursive reset during the completion callback. I think the simplest way of fixing this for now would be to:
I think this should be functionaly equivalent to what we have today and then we can potentially improve this in further follow ups. Can you potentially try that? If things continue to break / you are having trouble I would be fine to go back to the heap allocation and I can look at fixing this in a follow up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this LGTM for a fix with one small ASSERT comment. I would also like either @alyssawilk or @antoniovicente to take a quick look. Thank you!
/wait
PendingResponse& response = pending_response_.value(); | ||
// Encoder is used as part of decode* calls later in this function so pending_response_ can not | ||
// be reset just yet. Preserve the state in pending_response_done_ instead. | ||
pending_response_done_ = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't thrill me due to extra logic complexity, but I think this is an OK solution for now. Can you do me a favor and put an additional ASSERT(!pending_response_done_);
every time we load successfully check that pending_response has a value in the other functions above? I think there shouldn't be too many.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Ruslan Nigmatullin <elessar@dropbox.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
The failures are known flakes so will go ahead and merge. |
PR description was not updated when the implementation moved away from heap allocation to extra ASSERTs. |
Description:
ClientConnectionImpl::onMessageComplete()
movesPendingResponse
(and thereforeEncoder
) and tries to use it after that. As move invalidates the object - accessing it by pointer is not valid anymore.It looks like clang keeps objects roughly intact so it's uncaught by existing envoy tests. GCC on the other hand does invalidate the memory region, which leads to crash.
Risk Level: low
Testing: n/a (was only able to repro in gcc)
Docs Changes: n/a
Release Notes: n/a