-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Aws sigv4 metadata fetcher #7303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aws sigv4 metadata fetcher #7303
Conversation
Signed-off-by: Ivan Vitjuk <ivanvit@amazon.com>
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
Signed-off-by: Ivan Vitjuk <ivanvit@amazon.com>
|
||
EXPECT_FALSE(response.has_value()); | ||
|
||
// We do now check http.metadata_test.downstream_rq_completed value here because it's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like downstream_rq_completed is incremented in Mac even in the case connection is terminated on the Curl side due to the timeout. This doesn't happen on Linux.
Unfortunately I don't have access to a Mac machine to be able to debug this in detail.
@dio any suggestions how to proceed with this? Do we have a process when developers do not have access to a specific platform?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Unfortunately, we only have this CI. I'll try to take a look at this on my local. However, seems like the test is successful now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because I removed the following check that was succeeding on Linux but failing on Mac:
ASSERT_NE(nullptr, test_server_->counter("http.metadata_test.downstream_rq_completed")); EXPECT_EQ(0, test_server_->counter("http.metadata_test.downstream_rq_completed")->value());
For, to me unknown, reason downstream_rq_completed
was incremented on Mac for each request Curl aborted due to the timeout. On Linux this counter was not incremented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have a comment regarding this for now.
/retest |
🔨 rebuilding |
/wait-any |
Hi @dio this has been open for more than two weeks now, can we get it reviewed? |
/wait-any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, quick question.
/wait-any
* | ||
* @note In case of an error, function will log ENVOY_LOG_MISC(debug) message. | ||
* | ||
* @note This is not main loop safe method as it is blocking. It is intended to be used from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use curl here at all and have it be blocking? Why can't you use the built in Envoy HTTP async client facilities?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Don't have much more to add. When I originally implemented this, I ran into the issue that I ultimately needed to block signing the xDS request on the http request to get credentials. Note that the blocking is done on the gRPC client thread in my implementation. Not the main thread. It was after asking about methods for making blocking http requests it was suggested I make the blocking http client in #5546. But I don't recall in what issue it was suggested.
Then later in #5546 just using curl was proposed since it was being merged in as a dependency anyway. I felt more comfortable doing that since I wasn't very confident in my http client implementation. It was non-trivial for me to implement and test especially since I was very new to the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK that's fine. Ultimately I don't think curl should be used here for a variety of reasons (blocking, more third party code to deal with, not the normal Envoy logs/metrics/traces, etc.) but we can merge and iterate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I pointed out that libcurl was now available, but I can't remember if this was something we decided as the preferred approach. The main concern would be that we don't block on a worker thread either in the Envoy-based HTTP client or libcurl.
srcs = [ | ||
"aws_metadata_fetcher_integration_test.cc", | ||
], | ||
tags = ["exclusive"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this test tagged as exclusive? This acts as a serialization step and forces the total test run to increase by > 30s, as it can't be parallelized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integration test is creating an instance of the Envoy which is acting as a metadata endpoint. If we remove exclusive, wouldn't we risk port collision with other integration tests that create listeners?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless you are binding to fixed ports (which I didn't spot, we typically don't do this for the reason you state), this is safe to not be exclusive. The tests tend to bind to port zero, which allows safe concurrent execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @lavignes can you make a PR to remove exclusive
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lavignes thanks, appreciated!
Resolves feedback from #7303 Signed-off-by: LaVigne, Scott <lavignes@amazon.com>
Description:
This is third in a series of PRs implementing #5215. Metadata fetcher implemented in this PR will be used as an argument to the credential providers introduced in #7281 .
Risk Level: Low
Testing: Added integration tests
Docs Changes: N/A
Release Notes: N/A