-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Convert grpc_ares_wrapper to C++ #25108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a step in the right direction!
Please let me know if you have any questions. Thanks!
Reviewed 11 of 11 files at r1.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @apolcyn)
src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc, line 104 at r1 (raw file):
bool resolving_ = false; /// the pending resolving request std::unique_ptr<AresRequestInterface> pending_request_ = nullptr;
No need for = nullptr
.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 47 at r1 (raw file):
namespace grpc_core { class AresRequestInterface {
This doesn't really need to be an interface, since there's only ever one concrete implementation of it. If the goal here is just to hide the internal state, then let's just make all of the data members private (which they should be in a class anyway).
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 53 at r1 (raw file):
}; /* Asynchronously resolve \a name. Use \a default_port if a port isn't
Might as well switch to C++-style comments while we're at it.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 54 at r1 (raw file):
/* Asynchronously resolve \a name. Use \a default_port if a port isn't designated in \a name, otherwise use the port in \a name. grpc_ares_init()
Please update comments to refer to the new function names (e.g., AresInit()
instead of grpc_ares_init()
).
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 71 at r1 (raw file):
being held by the caller. The returned AresRequestInterface object is owned by the caller and it is safe to destroy after on_done is called back. */ extern std::unique_ptr<AresRequestInterface> (*LookupAresLocked)(
Can this be a static method of AresRequestInterface
? Or would that be too hard, given that it needs to be overridable in tests?
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 81 at r1 (raw file):
/* Initialize gRPC ares wrapper. Must be called at least once before grpc_resolve_address_ares(). */ grpc_error* AresInit(void);
Can these two be static methods of AresRequestInterface
?
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 96 at r1 (raw file):
/* Exposed in this header for C-core tests only */ extern void (*AresTestOnlyInjectConfig)(ares_channel channel);
This can be in a nested internal
namespace.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 67 at r1 (raw file):
struct grpc_ares_request : public AresRequestInterface { explicit grpc_ares_request(
No need for explicit
here.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 581 at r1 (raw file):
ServerAddressList* addresses) { if (GRPC_TRACE_FLAG_ENABLED(grpc_trace_cares_address_sorting)) { log_address_sorting_list(r, *addresses, "input");
Why move this from here into the callers? Seems like this adds duplication.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 1118 at r1 (raw file):
if (resolve_as_ip_literal_locked(name, default_port, addrs)) { grpc_ares_complete_request_locked(r.get()); return std::move(r);
No need for std::move()
in a return
statement. That happens automatically.
Same thing throughout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @markdroth)
src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc, line 104 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
No need for
= nullptr
.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 47 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This doesn't really need to be an interface, since there's only ever one concrete implementation of it. If the goal here is just to hide the internal state, then let's just make all of the data members private (which they should be in a class anyway).
I just did that to avoid needing to declare classes in the header file, but I've updated to put the grpc_ares_request
into the header (renamed into AresRequest
).
I haven't yet made any data members of AresRequest
private though, only since that would expand the scope of this PR a lot.
PLMK if you'd like to do that in this PR, though.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 53 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Might as well switch to C++-style comments while we're at it.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 54 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please update comments to refer to the new function names (e.g.,
AresInit()
instead ofgrpc_ares_init()
).
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 71 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Can this be a static method of
AresRequestInterface
? Or would that be too hard, given that it needs to be overridable in tests?
Right, the global method seems easiest to override for tests, so I'll tend to keep as is if OK.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 81 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Can these two be static methods of
AresRequestInterface
?
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 96 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This can be in a nested
internal
namespace.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 67 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
No need for
explicit
here.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 581 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Why move this from here into the callers? Seems like this adds duplication.
I moved this out since I wanted to log service addresses and grpclb addresses with slightly differently, but I've put this back into the sorting helper by adding a parameter to differentiate the logs.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 1118 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
No need for
std::move()
in areturn
statement. That happens automatically.Same thing throughout.
I had issues with a distrib test using an old gcc, which motivated this. I think that issue is no longer relevant though now that AresRequest
isn't a derived class.;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this!
Please let me know if you have any questions. Thanks!
Reviewed 8 of 8 files at r2.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @apolcyn)
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 47 at r1 (raw file):
Previously, apolcyn wrote…
I just did that to avoid needing to declare classes in the header file, but I've updated to put the
grpc_ares_request
into the header (renamed intoAresRequest
).I haven't yet made any data members of
AresRequest
private though, only since that would expand the scope of this PR a lot.PLMK if you'd like to do that in this PR, though.
It does seem like it would be better to just convert the whole module to C++ in a single PR, if that's not too hard. But I can live with it being split up if you'd prefer; if so, just add a TODO comment here indicating that we plan to finish converting it to idiomatic C++ in a subsequent PR.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 49 at r2 (raw file):
typedef struct grpc_ares_ev_driver grpc_ares_ev_driver; struct AresRequest {
Please document what this struct represents, how it's expected to be used, lifetime semantics, etc.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 61 at r2 (raw file):
/// Initialize the gRPC ares wrapper. Must be called at least once before /// grpc_core::ResolveAddressAres().
No need for grpc_core::
in any of these comments, since we're already in that namespace.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 68 at r2 (raw file):
/// wrapper only if it has been called the same number of times as /// grpc_core::AresInit(). static void Cleanup(void);
Suggest calling this Shutdown()
, for consistency with the term we use throughout core.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 103 at r2 (raw file):
/// designated in \a name, otherwise it uses the port in \a name. AresInit() /// must be called at least once before this function. The returned /// AresRequest object is owned by the caller and it is safe to destroy
Since the return value is a unique_ptr<>
, we don't need to explicitly state that ownership is transferred; that's implicit in the API now.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 581 at r1 (raw file):
Previously, apolcyn wrote…
I moved this out since I wanted to log service addresses and grpclb addresses with slightly differently, but I've put this back into the sorting helper by adding a parameter to differentiate the logs.
Are there external callers of this API, or is it only exposed for testing purposes? If the former, then it might be better to just have the caller log a separate message with the context before calling this function.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 135 at r2 (raw file):
} grpc_ares_hostbyname_request; static void AresRequest_ref_locked(AresRequest* r);
When we do finish converting this to C++, we should probably just make AresRequest
inherit from InternallyRefCounted<>
. (Internally, see go/grpc-c-core-ref-counted-types-update for expected usage.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 8 of 11 files reviewed, 7 unresolved discussions (waiting on @markdroth)
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 47 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
It does seem like it would be better to just convert the whole module to C++ in a single PR, if that's not too hard. But I can live with it being split up if you'd prefer; if so, just add a TODO comment here indicating that we plan to finish converting it to idiomatic C++ in a subsequent PR.
I've went ahead and converted the rest of the module to C++
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 49 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please document what this struct represents, how it's expected to be used, lifetime semantics, etc.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 61 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
No need for
grpc_core::
in any of these comments, since we're already in that namespace.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 68 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Suggest calling this
Shutdown()
, for consistency with the term we use throughout core.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 103 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Since the return value is a
unique_ptr<>
, we don't need to explicitly state that ownership is transferred; that's implicit in the API now.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 581 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Are there external callers of this API, or is it only exposed for testing purposes? If the former, then it might be better to just have the caller log a separate message with the context before calling this function.
This is only exposed for testing purposes
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 135 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
When we do finish converting this to C++, we should probably just make
AresRequest
inherit fromInternallyRefCounted<>
. (Internally, see go/grpc-c-core-ref-counted-types-update for expected usage.)
Thanks for the link to go/grpc-c-core-ref-counted-types-update.
After looking at this for a while, the IMO cleanest setup I've found is to keep AresRequest
as a unique_ptr
in it's external owner, and to introduce a small internal wrapper (which is DualRefCounted
) over the AresRequest
, and this wrapper is responsible for deciding when to schedule on_done
.
My argument is that there are three different sets of owners of the AresRequest
object:
-
The single external owner, i.e. the caller of
LookupAresLocked
-
All pending DNS resolutions (e.g. one ref for A records, another for AAAA, SRV, etc.)
-
Timer and I/O callbacks
The contract with the external owner is that the AresRequest
object is safe to delete as soon as on_done
is scheduled, so we just need to make sure that owners 2) and 3) are all gone first, and then we're safe to schedule on_done
.
Owners 2) and 3) are different though, because as soon as all owners in 2) are gone, we want to cancel/shutdown/remove all owners in 3) as soon as possible.
So, the internal wrapper object uses owners in 2) as strong refs and owners in 3) as weak refs, and we schedule on_done
in the wrapper object's dtor.
PLMK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this! I'm really excited to see this code being modernized.
My only really significant comment is the one about the ref-counting semantics. Most of the others are relatively minor.
Please let me know if you have any questions. Thanks!
Reviewed 3 of 3 files at r4.
Reviewable status: all files reviewed, 27 unresolved discussions (waiting on @apolcyn)
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 55 at r4 (raw file):
/// (A queries, AAAA queries, etc.). An AresRequest is created with a call /// to LookupAresLocked, and it's safe to destroy as soon as the \a on_done /// callback passed to LookupAresLocked is ran. Meanwhile, a name resolution
s/is ran/has been run/
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 59 at r4 (raw file):
class AresRequest final { public: static std::unique_ptr<AresRequest> LookupAresLockedImpl(
Suggest calling this Create()
.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 60 at r4 (raw file):
public: static std::unique_ptr<AresRequest> LookupAresLockedImpl( const char* dns_server, const char* name, const char* default_port,
Please pass in all three of these parameters as absl::string_view
.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 61 at r4 (raw file):
static std::unique_ptr<AresRequest> LookupAresLockedImpl( const char* dns_server, const char* name, const char* default_port, grpc_pollset_set* interested_parties, grpc_closure* on_done,
Let's use std::function<>
instead of grpc_closure
.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 64 at r4 (raw file):
std::unique_ptr<grpc_core::ServerAddressList>* addrs, std::unique_ptr<grpc_core::ServerAddressList>* balancer_addrs, char** service_config_json, int query_timeout_ms,
Let's use std::string*
for service_config_json
.
Same thing throughout.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 65 at r4 (raw file):
std::unique_ptr<grpc_core::ServerAddressList>* balancer_addrs, char** service_config_json, int query_timeout_ms, std::shared_ptr<grpc_core::WorkSerializer> work_serializer);
As part of our long-term strategy to move the c-ares integration out of the client_channel code and into the iomgr code, it might be a good idea to remove the WorkSerializer
here and instead just use a mutex internally. That way, callers that don't use a WorkSerializer
(basically everything except the client_channel code) can use this API without needing to provide one, and the client_channel code can be responsible for hopping back into the WorkSerializer
when its callback function is invoked.
This doesn't need to be done as part of this PR, though. We can just add a TODO for now.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 69 at r4 (raw file):
/// Cancel the pending request. Must be called while holding the /// WorkSerializer that was used to call \a LookupAresLocked. void ShutdownLocked();
I think the original name made more sense here. This is for cancelling an individual request, not shutting down the whole subsystem.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 92 at r4 (raw file):
/// AresRequest object (the one that invoked LookupAresLocked) can safely /// destroy it. class OnDoneScheduler final : public DualRefCounted<OnDoneScheduler> {
If we do keep this class for some reason (see my comment elsewhere for why I don't think we should have it), it should be private within AresRequest
, since it's not needed outside of the implementation.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 109 at r4 (raw file):
private: explicit AresRequest(
No need for explicit
, since there is more than one argument here.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 109 at r4 (raw file):
private: explicit AresRequest(
If we make AresRequest
InternallyRefCounted<>
as I suggested elsewhere, then we'll probably want to make this ctor public. We can document that callers should use the factory method instead of instantiating this directly, but making this private will make it impossible to use MakeOrphanable<>
even from within the factory method.
For related discussion, see https://abseil.io/tips/134.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 154 at r4 (raw file):
} struct FdNode {
Let's make this a class (private data members with appropriate names, etc).
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 154 at r4 (raw file):
} struct FdNode {
As per the style guide, within each section (public, private, etc), types (including structs and classes) should be defined before methods.
https://google.github.io/styleguide/cppguide.html#Declaration_Order
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 202 at r4 (raw file):
std::unique_ptr<grpc_core::ServerAddressList>* balancer_addresses_out_; // the pointer to receive the service config in JSON char** service_config_json_out_;
Would be nice to use std::string*
here.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 210 at r4 (raw file):
std::shared_ptr<grpc_core::WorkSerializer> work_serializer_; // a list of fds that this request is currently using. FdNode* fds_ = nullptr;
Can we use an STL container here instead of the current C-style linked list approach? Also, can we use std::unique_ptr<>
, so that the compiler can enforce ownership? For example, we could use something like std::set<std::unique_ptr<FdNode>>
here.
Alternatively, maybe we could even do something like std::map<ares_socket_t, std::unique_ptr<FdNode>>
, to make lookups faster in AresRequest::PopFdNodeLocked()
.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 135 at r2 (raw file):
Previously, apolcyn wrote…
Thanks for the link to go/grpc-c-core-ref-counted-types-update.
After looking at this for a while, the IMO cleanest setup I've found is to keep
AresRequest
as aunique_ptr
in it's external owner, and to introduce a small internal wrapper (which isDualRefCounted
) over theAresRequest
, and this wrapper is responsible for deciding when to scheduleon_done
.My argument is that there are three different sets of owners of the
AresRequest
object:
The single external owner, i.e. the caller of
LookupAresLocked
All pending DNS resolutions (e.g. one ref for A records, another for AAAA, SRV, etc.)
Timer and I/O callbacks
The contract with the external owner is that the
AresRequest
object is safe to delete as soon ason_done
is scheduled, so we just need to make sure that owners 2) and 3) are all gone first, and then we're safe to scheduleon_done
.Owners 2) and 3) are different though, because as soon as all owners in 2) are gone, we want to cancel/shutdown/remove all owners in 3) as soon as possible.
So, the internal wrapper object uses owners in 2) as strong refs and owners in 3) as weak refs, and we schedule
on_done
in the wrapper object's dtor.PLMK
This ownership model seems very different from the way we typically do things, and I think it's a lot more complex than what we really need. I don't think we should need to differentiate between the refs held by individual DNS queries and those that correspond to timers and I/O callbacks.
I am assuming here that we have the following properties:
- When a DNS query is started, that triggers I/O, which will eventually invoke a callback.
- When a DNS query finishes, all I/O associated with that query has been completed. (There may be additional I/O callbacks outstanding at this point, but if so they can be cancelled as soon as we know that they're not needed.)
If those properties hold, then I don't think we actually need to hold any refs for the DNS queries themselves; it should be sufficient to hold refs only for the underlying I/O callbacks. We just need to make sure that the last I/O callback for a given DNS query processes the result of the DNS query before it releases its ref.
To say this another way, it seems like we actually have two distinct problems here:
- Deciding when it's safe to destroy the
AresRequest
object. - Deciding when to invoke the on-done callback.
I think this will be simpler if we pull those two problems apart and have a separate solution for each of them. I think we should use ref-counting for (1) but a separate mechanism for (2).
For (1), I think we should make the AresRequest
object itself be InternallyRefCounted<>
. The internal refs will be held by the I/O and timer callbacks, but the caller doesn't have to know or care about any of that. The caller can just use an OrphanablePtr<>
to hold this object, and it can orphan the object whenever it's done with the request. Orphaning the object will cancel any in-flight DNS queries, so the underlying I/O callbacks will return failures, which will cause us to invoke the on-done callback with an error. It will also release the caller's ref.
For (2), I think we can simply have the AresRequest
object contain an integer data member that indicates the number of in-flight queries. Whenever we launch a query, we increment it; whenever a query completes, we decrement it. At the end of completing a query (after starting any subsequent queries that may be needed), if the count is zero, then we invoke the on-done callback. (Or if we need something more complex for some reason, we could do something like having a separate bool for each DNS query to indicate if it's complete.)
This approach will de-couple the logic for invoking the on-done callback from the ref-counting logic. It will avoid the need for this OnDoneScheduler
class. And it will allow us to invoke the on-done callback as soon as we have the result, independently of whether we're still waiting for some straggling callbacks to run before destroying the AresRequest
object itself.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 60 at r4 (raw file):
namespace grpc_core { struct AresHostByNameRequest {
Please document what this struct represents and how it is used.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 61 at r4 (raw file):
struct AresHostByNameRequest { explicit AresHostByNameRequest(RefCountedPtr<AresRequest::OnDoneScheduler> o,
No need for explicit
here, since there is more than one parameter.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 96 at r4 (raw file):
}; AresRequest::AresRequest(
All methods should be defined in the same order in which they appear in the .h file.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 217 at r4 (raw file):
"err=%s", this, shutting_down_, grpc_error_string(error)); // TODO(apolcyn): can we remove this check on shutting_down_?
I don't think this can be removed. It's needed to deal with race conditions. Internally, see go/grpc-c-core-closure-slides for details.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 437 at r4 (raw file):
const std::string& logging_prefix) { if (GRPC_TRACE_FLAG_ENABLED(grpc_trace_cares_address_sorting)) { LogAddressSortingList(r, *addresses, (logging_prefix + "-input").c_str());
Prefer absl::StrCat()
over using the +
operator.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 462 at r4 (raw file):
int /*timeouts*/, struct hostent* hostent) { std::unique_ptr<AresHostByNameRequest> hr(
It seems a little inconsistent that we use an AresHostByNameRequest
struct for the A and AAAA lookups, but we don't have any such struct for the SRV or TXT lookups. Is there some reasonable way that we can make this more uniform, either by not using a struct for any of them, or by using a struct for all of them?
If we go with the latter approach, we could use a real class for each query instead of just a struct, and then the OnHostByNameDoneLocked()
, OnSRVQueryDoneLocked()
, and OnTXTDoneLocked()
methods could be moved into the respective classes.
If there's no way to do this without making things more convoluted, then don't worry about it. It's a nice-to-have, not a requirement.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 523 at r4 (raw file):
std::string error_msg = absl::StrFormat( "C-ares status is not ARES_SUCCESS qtype=%s name=%s is_balancer=%d: %s", hr->qtype, hr->host.c_str(), hr->is_balancer, ares_strerror(status));
No need for .c_str()
here; it's fine to pass a std::string
(or an absl::string_view
) to any of the absl string functions (e.g., StrFormat()
, StrCat()
, etc).
Same thing throughout.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 551 at r4 (raw file):
o, std::string(srv_it->host), htons(srv_it->port), true /* is_balancer */, AF_INET6 /* address_family */); ares_gethostbyname(r->channel_, hr->host.c_str(), hr->address_family,
This can be moved into the AresHostByNameRequest
ctor.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 581 at r4 (raw file):
static_cast<AresRequest::OnDoneScheduler*>(arg)); AresRequest* r = o->parent(); const std::string kServiceConfigAttributePrefix = "grpc_config=";
This should go back to being declared as a global constant of type char*
, so that we don't do an unnecessary allocation every time we enter this function.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 593 at r4 (raw file):
for (result = reply; result != nullptr; result = result->next) { if (result->record_start && memcmp(result->txt, kServiceConfigAttributePrefix.data(),
absl::StartsWith(result->txt, kServiceConfigAttributePrefix)
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 714 at r4 (raw file):
grpc_resolved_address addr; std::string hostport = grpc_core::JoinHostPort(target_host_, atoi(target_port_.c_str()));
No need for grpc_core::
anywhere in this code, because we're already in that namespace.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 863 at r4 (raw file):
namespace { // A GrpcResolveAddressAresRequest maintains the state need to
I think the need for this alternative interface will go away when we move the c-ares integration into the iomgr code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 9 of 11 files reviewed, 27 unresolved discussions (waiting on @apolcyn and @markdroth)
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 135 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This ownership model seems very different from the way we typically do things, and I think it's a lot more complex than what we really need. I don't think we should need to differentiate between the refs held by individual DNS queries and those that correspond to timers and I/O callbacks.
I am assuming here that we have the following properties:
- When a DNS query is started, that triggers I/O, which will eventually invoke a callback.
- When a DNS query finishes, all I/O associated with that query has been completed. (There may be additional I/O callbacks outstanding at this point, but if so they can be cancelled as soon as we know that they're not needed.)
If those properties hold, then I don't think we actually need to hold any refs for the DNS queries themselves; it should be sufficient to hold refs only for the underlying I/O callbacks. We just need to make sure that the last I/O callback for a given DNS query processes the result of the DNS query before it releases its ref.
To say this another way, it seems like we actually have two distinct problems here:
- Deciding when it's safe to destroy the
AresRequest
object.- Deciding when to invoke the on-done callback.
I think this will be simpler if we pull those two problems apart and have a separate solution for each of them. I think we should use ref-counting for (1) but a separate mechanism for (2).
For (1), I think we should make the
AresRequest
object itself beInternallyRefCounted<>
. The internal refs will be held by the I/O and timer callbacks, but the caller doesn't have to know or care about any of that. The caller can just use anOrphanablePtr<>
to hold this object, and it can orphan the object whenever it's done with the request. Orphaning the object will cancel any in-flight DNS queries, so the underlying I/O callbacks will return failures, which will cause us to invoke the on-done callback with an error. It will also release the caller's ref.For (2), I think we can simply have the
AresRequest
object contain an integer data member that indicates the number of in-flight queries. Whenever we launch a query, we increment it; whenever a query completes, we decrement it. At the end of completing a query (after starting any subsequent queries that may be needed), if the count is zero, then we invoke the on-done callback. (Or if we need something more complex for some reason, we could do something like having a separate bool for each DNS query to indicate if it's complete.)This approach will de-couple the logic for invoking the on-done callback from the ref-counting logic. It will avoid the need for this
OnDoneScheduler
class. And it will allow us to invoke the on-done callback as soon as we have the result, independently of whether we're still waiting for some straggling callbacks to run before destroying theAresRequest
object itself.
The main problem I'm worried about with de-coupling the scheduling of on_done
and the destruction of the AresRequest
is that this makes it harder to reason about the correctness of the AresRequest
lifetime.
If they on_done
and the AresRequest
dtor are coupled, then because we don't schedule on_done
until the AresRequest
has no internal refs left, the entity that constructed the AresRequest
also manages its deletion, and in this way we can basically guarantee that DNS resolutions don't leave any leaks.
If we leave around stray timer and I/O callbacks after scheduling on_done
, I'd be worried running into a hard-to-debug scenario where e.g. timer or I/O callbacks don't getting cleaned up in a timely manner after the resolution is done, and this kind of leak might not be easily caught in e.g. unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 9 of 11 files reviewed, 27 unresolved discussions (waiting on @apolcyn and @markdroth)
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 135 at r2 (raw file):
Previously, apolcyn wrote…
The main problem I'm worried about with de-coupling the scheduling of
on_done
and the destruction of theAresRequest
is that this makes it harder to reason about the correctness of theAresRequest
lifetime.If they
on_done
and theAresRequest
dtor are coupled, then because we don't scheduleon_done
until theAresRequest
has no internal refs left, the entity that constructed theAresRequest
also manages its deletion, and in this way we can basically guarantee that DNS resolutions don't leave any leaks.If we leave around stray timer and I/O callbacks after scheduling
on_done
, I'd be worried running into a hard-to-debug scenario where e.g. timer or I/O callbacks don't getting cleaned up in a timely manner after the resolution is done, and this kind of leak might not be easily caught in e.g. unit tests.
Handling these remaining timer and I/O callbacks is actually the exact case that the InternallyRefCounted<>
pattern was created for. If we were always afraid that there might be a bug that prevents us from cleaning the object up, then we'd never be able to use this pattern anywhere.
Is there some reason that it's somehow more likely that a stray timer or I/O callback might not be cleaned up in this case than in any other case where we use InternallyRefCounted<>
? If not, I don't think this is a good reason not to use that pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 9 of 11 files reviewed, 27 unresolved discussions (waiting on @apolcyn and @markdroth)
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 55 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
s/is ran/has been run/
Done. But changed to "has began to run" to make it clear that it can be destroyed from within the callback
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 59 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Suggest calling this
Create()
.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 60 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please pass in all three of these parameters as
absl::string_view
.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 135 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Handling these remaining timer and I/O callbacks is actually the exact case that the
InternallyRefCounted<>
pattern was created for. If we were always afraid that there might be a bug that prevents us from cleaning the object up, then we'd never be able to use this pattern anywhere.Is there some reason that it's somehow more likely that a stray timer or I/O callback might not be cleaned up in this case than in any other case where we use
InternallyRefCounted<>
? If not, I don't think this is a good reason not to use that pattern.
There's not a particular reason that makes this code more likely to have timer or fd leaks, but the interactions are complicated enough that I'd be weary of at least introducing new bugs.
For example, with the approach that delays on_done
until the object is safe to delete, one nice thing is that if a code change introduces a bug that causes such a leak, nothing will work at all and we'll catch it right away. But if we leave around stray refs, such a bug could more easily go undetected -- i.e. it makes it now difficult to test that the code is correct.
I can get behind the desire to be consistent with other cases in the code base, but I also think that the approach of delaying on_done
until the object is safe to delete fits better with the C++ style guide's recommendation that objects be deleted by the code that creates them if it's possible.
What do you think if we got rid of the OnDoneScheduler
object, but still delayed on_done
until the object is safe to delete (we could perhaps do this with a counter of pending queries, and some kind of MaybeScheduleOnDone
helper at the end of every query, timer, and I/O callback) ?
Thanks for the review so far though, I'm still working on getting back to the rest of the comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 11 files reviewed, 27 unresolved discussions (waiting on @markdroth)
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 61 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Let's use
std::function<>
instead ofgrpc_closure
.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 64 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Let's use
std::string*
forservice_config_json
.Same thing throughout.
Done, but changed to absl::optional<std::string>*
, since it looks like we need to differentiate no service config with empty string service config
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 65 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
As part of our long-term strategy to move the c-ares integration out of the client_channel code and into the iomgr code, it might be a good idea to remove the
WorkSerializer
here and instead just use a mutex internally. That way, callers that don't use aWorkSerializer
(basically everything except the client_channel code) can use this API without needing to provide one, and the client_channel code can be responsible for hopping back into theWorkSerializer
when its callback function is invoked.This doesn't need to be done as part of this PR, though. We can just add a TODO for now.
Added a TODO.
While we can definitely make this not take a WorkSerializer
as a parameter, I'm a little worried that using a Mutex
internally may cause more complicated code than with a WorkSerializer
, but we can feasibly use either one internally.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 69 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I think the original name made more sense here. This is for cancelling an individual request, not shutting down the whole subsystem.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 92 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
If we do keep this class for some reason (see my comment elsewhere for why I don't think we should have it), it should be private within
AresRequest
, since it's not needed outside of the implementation.
Latest approach gets rid of this class
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 109 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
No need for
explicit
, since there is more than one argument here.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 109 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
If we make
AresRequest
InternallyRefCounted<>
as I suggested elsewhere, then we'll probably want to make this ctor public. We can document that callers should use the factory method instead of instantiating this directly, but making this private will make it impossible to useMakeOrphanable<>
even from within the factory method.For related discussion, see https://abseil.io/tips/134.
Made this public
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 154 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Let's make this a class (private data members with appropriate names, etc).
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 154 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
As per the style guide, within each section (public, private, etc), types (including structs and classes) should be defined before methods.
https://google.github.io/styleguide/cppguide.html#Declaration_Order
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 202 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Would be nice to use
std::string*
here.
changed to absl::optional<std::string>*
to differentiate non-existant and empty as described above.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 210 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Can we use an STL container here instead of the current C-style linked list approach? Also, can we use
std::unique_ptr<>
, so that the compiler can enforce ownership? For example, we could use something likestd::set<std::unique_ptr<FdNode>>
here.Alternatively, maybe we could even do something like
std::map<ares_socket_t, std::unique_ptr<FdNode>>
, to make lookups faster inAresRequest::PopFdNodeLocked()
.
Good idea, that nicely simplified things
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 135 at r2 (raw file):
Previously, apolcyn wrote…
There's not a particular reason that makes this code more likely to have timer or fd leaks, but the interactions are complicated enough that I'd be weary of at least introducing new bugs.
For example, with the approach that delays
on_done
until the object is safe to delete, one nice thing is that if a code change introduces a bug that causes such a leak, nothing will work at all and we'll catch it right away. But if we leave around stray refs, such a bug could more easily go undetected -- i.e. it makes it now difficult to test that the code is correct.I can get behind the desire to be consistent with other cases in the code base, but I also think that the approach of delaying
on_done
until the object is safe to delete fits better with the C++ style guide's recommendation that objects be deleted by the code that creates them if it's possible.What do you think if we got rid of the
OnDoneScheduler
object, but still delayedon_done
until the object is safe to delete (we could perhaps do this with a counter of pending queries, and some kind ofMaybeScheduleOnDone
helper at the end of every query, timer, and I/O callback) ?Thanks for the review so far though, I'm still working on getting back to the rest of the comments.
The latest approach gets rid of the OnDoneScheduler
class and the DualRefCounted
idea, but replaces with a more manual approach which uses a MaybeCallOnDone
helper function, and the new pending_queries_
, timeout_done_
and backup_poller_done_
fields.
I'm wondering if you think this new approach can look simpler and more reasonable. Please let me know.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 60 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please document what this struct represents and how it is used.
Latest approach gets rid of this struct
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 61 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
No need for
explicit
here, since there is more than one parameter.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 96 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
All methods should be defined in the same order in which they appear in the .h file.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 217 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I don't think this can be removed. It's needed to deal with race conditions. Internally, see go/grpc-c-core-closure-slides for details.
I think this still might be able to be removed since CancelLocked
should be idempotent, improved the TODO
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 437 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Prefer
absl::StrCat()
over using the+
operator.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 462 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
It seems a little inconsistent that we use an
AresHostByNameRequest
struct for the A and AAAA lookups, but we don't have any such struct for the SRV or TXT lookups. Is there some reasonable way that we can make this more uniform, either by not using a struct for any of them, or by using a struct for all of them?If we go with the latter approach, we could use a real class for each query instead of just a struct, and then the
OnHostByNameDoneLocked()
,OnSRVQueryDoneLocked()
, andOnTXTDoneLocked()
methods could be moved into the respective classes.If there's no way to do this without making things more convoluted, then don't worry about it. It's a nice-to-have, not a requirement.
Done, changed these to use classes which are different subclasses of AresQuery
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 523 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
No need for
.c_str()
here; it's fine to pass astd::string
(or anabsl::string_view
) to any of the absl string functions (e.g.,StrFormat()
,StrCat()
, etc).Same thing throughout.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 551 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This can be moved into the
AresHostByNameRequest
ctor.
I moved this to static methods of the various AresQuery
child classes. I believe we can't actually have this in the ctor because the c-ares API's can potentially run their callbacks inline, in which case we'd run the dtor before finishing the ctor completely.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 581 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This should go back to being declared as a global constant of type
char*
, so that we don't do an unnecessary allocation every time we enter this function.
What do you think about just changing to an absl::string_view
(done in the latest update) ?
Because string literals have static storage duration, I want to say most compilers should avoid any allocation.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 593 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
absl::StartsWith(result->txt, kServiceConfigAttributePrefix)
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 714 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
No need for
grpc_core::
anywhere in this code, because we're already in that namespace.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 863 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I think the need for this alternative interface will go away when we move the c-ares integration into the iomgr code.
Agree
Comments addresses (though with the ref counting scheme still in discussion). Thanks for the review so far. A heads up that there still seems to be something happening wrong on Windows, but I think this PR can still otherwise be good for another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really shaping up nicely!
Please let me know if you have any questions. Thanks!
Reviewed 3 of 6 files at r6, 2 of 2 files at r7, 2 of 2 files at r8.
Reviewable status: all files reviewed, 20 unresolved discussions (waiting on @apolcyn)
src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc, line 422 at r8 (raw file):
auto on_done = [this](grpc_error* error) { OnResolvedLocked(error); GRPC_ERROR_UNREF(error);
Ownership of this error should be passed to OnResolvedLocked()
, as per https://github.com/grpc/grpc/blob/master/doc/core/grpc-error.md#ownership-rules.
src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc, line 425 at r8 (raw file):
}; pending_request_ = LookupAresLocked( dns_server_, name_to_resolve_, kDefaultPort, interested_parties_, on_done,
std::move(on_done)
, or just declare the lambda inline here.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 55 at r4 (raw file):
Previously, apolcyn wrote…
Done. But changed to "has began to run" to make it clear that it can be destroyed from within the callback
How about "has been called"?
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 64 at r4 (raw file):
Previously, apolcyn wrote…
Done, but changed to
absl::optional<std::string>*
, since it looks like we need to differentiate no service config with empty string service config
Good catch!
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 71 at r8 (raw file):
std::shared_ptr<grpc_core::WorkSerializer> work_serializer); static std::unique_ptr<AresRequest> Create(
As per style guide, factory methods should go before ctors.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 103 at r8 (raw file):
class AresQuery { protected: explicit AresQuery(AresRequest* r);
Please call the parameter request
, as per general naming rules:
https://google.github.io/styleguide/cppguide.html#General_Naming_Rules
Same thing throughout (parameter names, data members, etc).
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 105 at r8 (raw file):
explicit AresQuery(AresRequest* r); ~AresQuery();
Base class dtor must be virtual, as per style guide.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 110 at r8 (raw file):
}; class AddressQuery : AresQuery {
All inheritence must be public
, as per style guide.
Same thing throughout.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 200 at r8 (raw file):
bool writable_registered_ = false; // if the fd has been shutdown yet from grpc iomgr perspective bool already_shutdown_ = false;
Suggest removing the already_
prefix here.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 135 at r2 (raw file):
Previously, apolcyn wrote…
The latest approach gets rid of the
OnDoneScheduler
class and theDualRefCounted
idea, but replaces with a more manual approach which uses aMaybeCallOnDone
helper function, and the newpending_queries_
,timeout_done_
andbackup_poller_done_
fields.I'm wondering if you think this new approach can look simpler and more reasonable. Please let me know.
I'd still prefer to just make this InternallyRefCounted<>
, for the following reasons:
- We have a standard pattern in C-core for how we handle cases like these, and I think we need to have a compelling reason to do something different here. I'm not really seeing one in this case.
- I don't think this code is that much more complex than any of the other classes inside C-core that have internal refs for pending callbacks, so I don't think it's actually that much more likely to cause a leak.
- If we did have a bug and it did cause a leak, the asan tests would catch it anyway, so delaying the on-done callback doesn't actually give us any additional safety. But it can delay sending the response back to the caller, which can hurt latency.
- The approach of having to depend on 3 different fields here is actually more complex and harder to understand than simply using
InternallyRefCounted<>
.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 96 at r4 (raw file):
Previously, apolcyn wrote…
Done.
You can split them out by class -- e.g., define the methods of AresRequest::AresQuery
first, then the methods of AresRequest::AddressQuery
, then AresRequest::SRVQuery
, then AresRequest::TXTQuery
, then AresRequest::FdNode
, then AresRequest
itself.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 581 at r4 (raw file):
Previously, apolcyn wrote…
What do you think about just changing to an
absl::string_view
(done in the latest update) ?Because string literals have static storage duration, I want to say most compilers should avoid any allocation.
Yeah, that works too.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 72 at r8 (raw file):
polled_fd_factory_(NewGrpcPolledFdFactory(work_serializer_)), query_timeout_ms_(query_timeout_ms), on_done_(on_done) {}
std::move()
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 85 at r8 (raw file):
AddressQuery* q = new AddressQuery(r, host, port, is_balancer, address_family); ares_gethostbyname(r->channel_, q->host_.c_str(), q->address_family_,
Please add a comment explaining why this can't be done in the ctor.
Same for all of the query types.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 258 at r8 (raw file):
// Found a service config record. if (result != nullptr) { *r->service_config_json_out_ =
This would probably be more efficient if written like this:
std::vector<absl::string_view> service_config_parts = {
absl::string_view(reinterpret_cast<const char*>(result->txt), result->length)
.substr(kServiceConfigAttributePrefix.size())
};
for (result = result->next; result != nullptr && !result->record_start;
result = result->next) {
service_config_parts.emplace_back(
reinterpret_cast<const char*>(result->txt), result->length);
}
*r->service_config_json_out_ = absl::StrJoin(service_config_parts, "");
This basically avoids allocating any strings until we know the final length.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 274 at r8 (raw file):
ares_free_data(reply); return; fail:
Instead of using goto
, I suggest declaring a local lambda at the top of the function that contains the error-handling code, like this:
auto on_error = [&]() {
// ...error handling code...
};
Then, in all of the places where we currently say goto fail;
, we can instead just say on_error(); return;
.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 442 at r8 (raw file):
backup_poller_done_ = true; if (!shutting_down_ && error == GRPC_ERROR_NONE) { for (auto& it : fds_) {
Suggest using p
here instead of it
, since this is actually a pair, not an iterator.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 477 at r8 (raw file):
// I/O readable/writable callbacks with these filedescriptors. void AresRequest::NotifyOnEventLocked() { std::map<ares_socket_t, std::unique_ptr<AresRequest::FdNode>> active_fds;
No need for AresRequest::
here, since we're already in that scope.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 486 at r8 (raw file):
ares_socket_t s = socks[i]; if (fds_[s] == nullptr) { fds_[s] = absl::make_unique<AresRequest::FdNode>(
Same here.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 506 at r8 (raw file):
// are therefore no longer in use, so they can be shut down and removed from // the list. for (auto& it : fds_) {
Suggest changing it
to p
.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 807 at r8 (raw file):
// Look up name using c-ares lib. r->ContinueAfterCheckLocalhostAndIPLiteralsLocked(dns_server); done:
It would be nice to avoid the goto
. In this case, there's only two statements after the goto
, so we can probably just inline them in the places where we're currently using goto
.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 814 at r8 (raw file):
void AresRequest::CancelLocked() { shutting_down_ = true; for (auto& it : fds_) {
Suggest using p
instead of it
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 9 of 12 files reviewed, 20 unresolved discussions (waiting on @apolcyn and @markdroth)
src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc, line 422 at r8 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Ownership of this error should be passed to
OnResolvedLocked()
, as per https://github.com/grpc/grpc/blob/master/doc/core/grpc-error.md#ownership-rules.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc, line 425 at r8 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
std::move(on_done)
, or just declare the lambda inline here.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 55 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
How about "has been called"?
done
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 71 at r8 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
As per style guide, factory methods should go before ctors.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 103 at r8 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please call the parameter
request
, as per general naming rules:https://google.github.io/styleguide/cppguide.html#General_Naming_Rules
Same thing throughout (parameter names, data members, etc).
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 105 at r8 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Base class dtor must be virtual, as per style guide.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 110 at r8 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
All inheritence must be
public
, as per style guide.Same thing throughout.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 200 at r8 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Suggest removing the
already_
prefix here.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 135 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I'd still prefer to just make this
InternallyRefCounted<>
, for the following reasons:
- We have a standard pattern in C-core for how we handle cases like these, and I think we need to have a compelling reason to do something different here. I'm not really seeing one in this case.
- I don't think this code is that much more complex than any of the other classes inside C-core that have internal refs for pending callbacks, so I don't think it's actually that much more likely to cause a leak.
- If we did have a bug and it did cause a leak, the asan tests would catch it anyway, so delaying the on-done callback doesn't actually give us any additional safety. But it can delay sending the response back to the caller, which can hurt latency.
- The approach of having to depend on 3 different fields here is actually more complex and harder to understand than simply using
InternallyRefCounted<>
.
I'm open to using InternallyRefCounted
if we still want to use it. I can see the consistency motivation. But I still have a couple of arguments against it:
If we did have a bug and it did cause a leak, the asan tests would catch it anyway, so delaying the on-done callback doesn't actually give us any additional safety.
There are still a number of potential leaks which won't be caught by asan.
I experimented with using InternallyRefCounted
, and then applied the following patch to induce bugs that leak both our timers and our fd's:
diff --git a/src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc b/src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc
index aec44ba4afe..b2ae4b3d0fc 100644
--- a/src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc
+++ b/src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc
@@ -489,7 +489,7 @@ AresRequest::~AresRequest() {
}
void AresRequest::Orphan() {
- CancelLocked();
+ //CancelLocked();
Unref();
}
@@ -720,9 +720,9 @@ void AresRequest::DecrementPendingQueries() {
// shut down any remaining fds.
// TODO(apolcyn): just run CancelLocked() ?
// Unref();
- shutting_down_ = true;
- grpc_timer_cancel(&query_timeout_);
- grpc_timer_cancel(&ares_backup_poll_alarm_);
+ //shutting_down_ = true;
+ //grpc_timer_cancel(&query_timeout_);
+ //grpc_timer_cancel(&ares_backup_poll_alarm_);
ServerAddressList* addresses = addresses_out_->get();
if (addresses != nullptr) {
AddressSortingSort(this, addresses, "service-addresses");
.... and I confirmed that even with these bugs, //test/cpp/naming:resolver_component_tests_runner_invoker
will pass without any complaints from asan.
FWIW, the timer leaks won't be caught because during grpc_shutdown
, we cancel any still-pending timer callbacks as a part of timer thread shutdown.
And the fd leaks won't be caught I believe because they're registered in some global data structures which bypass asans checks.
We could perhaps catch the fd leaks in our tests if we ran them under GRPC_ABORT_ON_LEAKS=true
(we might want be doing this for all tests anyways).
I'm not sure about the timer leaks, but there might be something similar to GRPC_ABORT_ON_LEAKS
that could catch them too.
But overall, these bugs couldn't sneak in as easily if we avoided the stray refs, so I think there are real safety benefits to avoiding them.
it can delay sending the response back to the caller, which can hurt latency.
I'm not keen on doing this for performance reasons, since this would only be a micro-optimization and it's on the control plane anyways.
The approach of having to depend on 3 different fields here is actually more complex and harder to understand than simply using InternallyRefCounted<>.
FWIW, the InternallyRefCounted
approach will still rely on the same pending_queries_
field and surrounding logic. We can, however, delete the timeout_done_
and backup_poller_done_
fields by using InternallyRefCounted
.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 96 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
You can split them out by class -- e.g., define the methods of
AresRequest::AresQuery
first, then the methods ofAresRequest::AddressQuery
, thenAresRequest::SRVQuery
, thenAresRequest::TXTQuery
, thenAresRequest::FdNode
, thenAresRequest
itself.
done
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 72 at r8 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
std::move()
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 85 at r8 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please add a comment explaining why this can't be done in the ctor.
Same for all of the query types.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 258 at r8 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This would probably be more efficient if written like this:
std::vector<absl::string_view> service_config_parts = { absl::string_view(reinterpret_cast<const char*>(result->txt), result->length) .substr(kServiceConfigAttributePrefix.size()) }; for (result = result->next; result != nullptr && !result->record_start; result = result->next) { service_config_parts.emplace_back( reinterpret_cast<const char*>(result->txt), result->length); } *r->service_config_json_out_ = absl::StrJoin(service_config_parts, "");
This basically avoids allocating any strings until we know the final length.
Nice change! done
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 274 at r8 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Instead of using
goto
, I suggest declaring a local lambda at the top of the function that contains the error-handling code, like this:auto on_error = [&]() { // ...error handling code... };
Then, in all of the places where we currently say
goto fail;
, we can instead just sayon_error(); return;
.
Changed this to just inline the code behind the goto label before each return, per your other comment
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 442 at r8 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Suggest using
p
here instead ofit
, since this is actually a pair, not an iterator.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 477 at r8 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
No need for
AresRequest::
here, since we're already in that scope.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 486 at r8 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Same here.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 506 at r8 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Suggest changing
it
top
.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 814 at r8 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Suggest using
p
instead ofit
.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good!
Please let me know if you have any questions. Thanks!
Reviewed 3 of 3 files at r9.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @apolcyn)
src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc, line 422 at r8 (raw file):
Previously, apolcyn wrote…
Done.
You also need to change OnResolvedLocked()
to unref it.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 101 at r9 (raw file):
// some book keeping on the relevant AresRequest that's needed to carry // out the overall resolution. class AresQuery {
Do we actually need this base class? Why not just have each subclass store the request itself?
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 135 at r2 (raw file):
Previously, apolcyn wrote…
I'm open to using
InternallyRefCounted
if we still want to use it. I can see the consistency motivation. But I still have a couple of arguments against it:If we did have a bug and it did cause a leak, the asan tests would catch it anyway, so delaying the on-done callback doesn't actually give us any additional safety.
There are still a number of potential leaks which won't be caught by asan.
I experimented with using
InternallyRefCounted
, and then applied the following patch to induce bugs that leak both our timers and our fd's:diff --git a/src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc b/src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc index aec44ba4afe..b2ae4b3d0fc 100644 --- a/src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc +++ b/src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc @@ -489,7 +489,7 @@ AresRequest::~AresRequest() { } void AresRequest::Orphan() { - CancelLocked(); + //CancelLocked(); Unref(); } @@ -720,9 +720,9 @@ void AresRequest::DecrementPendingQueries() { // shut down any remaining fds. // TODO(apolcyn): just run CancelLocked() ? // Unref(); - shutting_down_ = true; - grpc_timer_cancel(&query_timeout_); - grpc_timer_cancel(&ares_backup_poll_alarm_); + //shutting_down_ = true; + //grpc_timer_cancel(&query_timeout_); + //grpc_timer_cancel(&ares_backup_poll_alarm_); ServerAddressList* addresses = addresses_out_->get(); if (addresses != nullptr) { AddressSortingSort(this, addresses, "service-addresses");
.... and I confirmed that even with these bugs,
//test/cpp/naming:resolver_component_tests_runner_invoker
will pass without any complaints from asan.FWIW, the timer leaks won't be caught because during
grpc_shutdown
, we cancel any still-pending timer callbacks as a part of timer thread shutdown.And the fd leaks won't be caught I believe because they're registered in some global data structures which bypass asans checks.
We could perhaps catch the fd leaks in our tests if we ran them under
GRPC_ABORT_ON_LEAKS=true
(we might want be doing this for all tests anyways).I'm not sure about the timer leaks, but there might be something similar to
GRPC_ABORT_ON_LEAKS
that could catch them too.But overall, these bugs couldn't sneak in as easily if we avoided the stray refs, so I think there are real safety benefits to avoiding them.
it can delay sending the response back to the caller, which can hurt latency.
I'm not keen on doing this for performance reasons, since this would only be a micro-optimization and it's on the control plane anyways.
The approach of having to depend on 3 different fields here is actually more complex and harder to understand than simply using InternallyRefCounted<>.
FWIW, the
InternallyRefCounted
approach will still rely on the samepending_queries_
field and surrounding logic. We can, however, delete thetimeout_done_
andbackup_poller_done_
fields by usingInternallyRefCounted
.
All of your concerns about asan not catching these problems would also apply to almost every other place where we use InternallyRefCounted<>
, so I don't think that's a good argument for doing something different here. We have an established pattern for how we handle this kind of use-case in C-core, and I think that to justify doing something different, there needs to be an argument why this case is actually different from all of the others. Right now, I'm not hearing such an argument.
I'd be fine using GRPC_ABORT_ON_LEAKS=true
to run resolver_component_tests_runner_invoker
. You could also investigate using it for other tests if you want. I assume that there are reasons why it's not enabled by default for all of our tests, but I'm not sure.
I don't know if there's an equivalent to GRPC_ABORT_ON_LEAKS
for timers, but I'm also not too worried about timer callbacks leaking. Even if we forget to cancel a timer, the timer itself will fire normally at some point, and the resources will get cleaned up then.
However, if the timer is set to something like 5 minutes, that can cause significant latency for RPCs. That seems like a bad failure mode -- we can live with delaying that long to free resources a lot more easily than we can live with delaying RPCs for that long.
I understand that we'll still need pending_queries_
, but I think that's better, because we'll have two different mechanisms for two different things. The pending_queries_
mechanism will be used for deciding when to invoke on_done, and the ref-counting mechanism will be used to decide when to clean up the object. The two mechanisms don't really need to interact directly at all, so they're both simpler to understand.
Please go ahead and change this code to use InternallyRefCounted<>
.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 262 at r9 (raw file):
} } // Foquery und a service config record.
Looks like an accidental edit -- should say "Found".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @markdroth)
src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc, line 422 at r8 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
You also need to change
OnResolvedLocked()
to unref it.
OnResolvedLocked
is currently unreffing it.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 101 at r9 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Do we actually need this base class? Why not just have each subclass store the request itself?
yeah this base class isn't providing a lot, latest updates get rid of it
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 135 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
All of your concerns about asan not catching these problems would also apply to almost every other place where we use
InternallyRefCounted<>
, so I don't think that's a good argument for doing something different here. We have an established pattern for how we handle this kind of use-case in C-core, and I think that to justify doing something different, there needs to be an argument why this case is actually different from all of the others. Right now, I'm not hearing such an argument.I'd be fine using
GRPC_ABORT_ON_LEAKS=true
to runresolver_component_tests_runner_invoker
. You could also investigate using it for other tests if you want. I assume that there are reasons why it's not enabled by default for all of our tests, but I'm not sure.I don't know if there's an equivalent to
GRPC_ABORT_ON_LEAKS
for timers, but I'm also not too worried about timer callbacks leaking. Even if we forget to cancel a timer, the timer itself will fire normally at some point, and the resources will get cleaned up then.However, if the timer is set to something like 5 minutes, that can cause significant latency for RPCs. That seems like a bad failure mode -- we can live with delaying that long to free resources a lot more easily than we can live with delaying RPCs for that long.
I understand that we'll still need
pending_queries_
, but I think that's better, because we'll have two different mechanisms for two different things. Thepending_queries_
mechanism will be used for deciding when to invoke on_done, and the ref-counting mechanism will be used to decide when to clean up the object. The two mechanisms don't really need to interact directly at all, so they're both simpler to understand.Please go ahead and change this code to use
InternallyRefCounted<>
.
While updating to use InternallyRefCounted
, I ran into another reason to not use it: the c-ares lookup takes a grpc_pollset_set
parameter, and GrpcPolledFd
's need to use that pollset set. In particular, the posix GrpcPolledFd
implementation needs to add its fds to the pollset set and then remove them later (it does this in the GrpcPolledFd
dtor).
The on_done
callback holds an implicit ref over this pollset set param, and thus calling on_done
before we're done using the pollset set is unsafe (I noticed this with crashes un the resolver_component_tests_runner_invoker
under epollex).
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 807 at r8 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
It would be nice to avoid the
goto
. In this case, there's only two statements after thegoto
, so we can probably just inline them in the places where we're currently usinggoto
.
goto removed from earlier updates
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 262 at r9 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Looks like an accidental edit -- should say "Found".
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @markdroth)
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 135 at r2 (raw file):
Previously, apolcyn wrote…
While updating to use
InternallyRefCounted
, I ran into another reason to not use it: the c-ares lookup takes agrpc_pollset_set
parameter, andGrpcPolledFd
's need to use that pollset set. In particular, the posixGrpcPolledFd
implementation needs to add its fds to the pollset set and then remove them later (it does this in theGrpcPolledFd
dtor).The
on_done
callback holds an implicit ref over this pollset set param, and thus callingon_done
before we're done using the pollset set is unsafe (I noticed this with crashes un theresolver_component_tests_runner_invoker
under epollex).
We can perhaps fix this latter issue by creating a new pollset set that belongs to the AresRequest
object, adding it to the pollset_set param passed to LookupAresLocked
, and having internal GrpcPolledFd
objects use the pollset set belonging to the AresRequest
object.
Working on this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 8 of 13 files reviewed, 4 unresolved discussions (waiting on @markdroth)
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 135 at r2 (raw file):
Previously, apolcyn wrote…
We can perhaps fix this latter issue by creating a new pollset set that belongs to the
AresRequest
object, adding it to the pollset_set param passed toLookupAresLocked
, and having internalGrpcPolledFd
objects use the pollset set belonging to theAresRequest
object.Working on this...
It turns out that creating a pollset_set internal to the AresRequest
object won't work actually, because, at least under the poll
poller, we need to delete pollset set B from pollset set A, before destroying B, if we ever previously added B to A. Otherwise, we wind up with a dangling pointer from pollset set A to B.
This bug was caught by //test/core/http:httpcli_test@poller=poll
FTR, see https://source.cloud.google.com/results/invocations/4498d7e7-b1fe-41b7-9b6f-c53f27d92b3f/targets/%2F%2Ftest%2Fcore%2Fhttp:httpcli_test@poller%3Dpoll/log
So we need to leverage the implicit ref that on_done
holds over the LookupAresLocked
pollset set parameter, so long as we need to use that pollset set for anything.
I think this is reason for sticking with the non-refcounted approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 8 of 13 files reviewed, 4 unresolved discussions (waiting on @markdroth)
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 135 at r2 (raw file):
Previously, apolcyn wrote…
It turns out that creating a pollset_set internal to the
AresRequest
object won't work actually, because, at least under thepoll
poller, we need to delete pollset set B from pollset set A, before destroying B, if we ever previously added B to A. Otherwise, we wind up with a dangling pointer from pollset set A to B.This bug was caught by
//test/core/http:httpcli_test@poller=poll
FTR, see https://source.cloud.google.com/results/invocations/4498d7e7-b1fe-41b7-9b6f-c53f27d92b3f/targets/%2F%2Ftest%2Fcore%2Fhttp:httpcli_test@poller%3Dpoll/logSo we need to leverage the implicit ref that
on_done
holds over theLookupAresLocked
pollset set parameter, so long as we need to use that pollset set for anything.I think this is reason for sticking with the non-refcounted approach.
Sorry for the back and forth here.
I've fixed this last issue with InternallyRefCounted
by deleting the fds from the LookupAresLocked
pollset set param during GrpcPolledFd::ShutdownLocked
, which is guaranteed to be called on every fd, before on_done
is called, ensuring liveness of the pollset set during the deletion.
So the code now uses InternallyRefCounted
again.
PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good! Remaining comments are mostly minor.
Please let me know if you have any questions. Thanks!
Reviewed 2 of 5 files at r10, 8 of 8 files at r11.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @apolcyn)
src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc, line 422 at r8 (raw file):
Previously, apolcyn wrote…
OnResolvedLocked
is currently unreffing it.
You're right. Sorry, somehow I thought it wasn't.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver.h, line 59 at r11 (raw file):
public: virtual ~GrpcPolledFdFactory() {} /* Creates a new wrapped fd for the current platform.
Please use C++-style comments.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 135 at r2 (raw file):
Previously, apolcyn wrote…
Sorry for the back and forth here.
I've fixed this last issue with
InternallyRefCounted
by deleting the fds from theLookupAresLocked
pollset set param duringGrpcPolledFd::ShutdownLocked
, which is guaranteed to be called on every fd, beforeon_done
is called, ensuring liveness of the pollset set during the deletion.So the code now uses
InternallyRefCounted
again.PTAL
This looks good!
I agree that the pollset_set lifetime nonsense is annoying. I hope that all of that will go away as part of moving to the new polling model.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 288 at r11 (raw file):
AresRequest::FdNode::FdNode(RefCountedPtr<AresRequest> request, std::unique_ptr<GrpcPolledFd> grpc_polled_fd) : request_(request), grpc_polled_fd_(std::move(grpc_polled_fd)) {
std::move(request)
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 289 at r11 (raw file):
std::unique_ptr<GrpcPolledFd> grpc_polled_fd) : request_(request), grpc_polled_fd_(std::move(grpc_polled_fd)) { GRPC_CARES_TRACE_LOG("request:%p new fd: %s", request.get(),
This will need to look at request_
if we use std::move()
above.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 731 at r11 (raw file):
// After setting shutting_down_ = true, NotifyOnEventLocked will // shut down any remaining fds. // TODO(apolcyn): just run CancelLocked() ?
Any reason not to do this?
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 732 at r11 (raw file):
// shut down any remaining fds. // TODO(apolcyn): just run CancelLocked() ? // Unref();
Why is this comment needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 11 of 15 files reviewed, 5 unresolved discussions (waiting on @markdroth)
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver.h, line 59 at r11 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please use C++-style comments.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 288 at r11 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
std::move(request)
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 289 at r11 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This will need to look at
request_
if we usestd::move()
above.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 731 at r11 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Any reason not to do this?
Done. I've changed this method to ShutdownIOLocked
though, just since this is now a private method and ShutdownIOLocked
seemed to reflect what it does better.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 732 at r11 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Why is this comment needed?
whoops! removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the ping on this -- I'm sorry it fell off my radar!
The remaining comments are minor, so feel free to merge after addressing them.
Thanks!
Reviewed 2 of 4 files at r12, 7 of 7 files at r13.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @apolcyn)
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver.h, line 1 at r13 (raw file):
/*
Please use C++-style comments here.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver.h, line 81 at r13 (raw file):
} // namespace grpc_core #endif /* GRPC_CORE_EXT_FILTERS_CLIENT_CHANNEL_RESOLVER_DNS_C_ARES_GRPC_ARES_EV_DRIVER_H \
Please use a C++-style comment here as well.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 55 at r13 (raw file):
/// An AresRequest is a handle over a complete name resolution process /// (A queries, AAAA queries, etc.). An AresRequest is created with a call /// to LookupAresLocked, and it the name resolution process begins when
s/it the/the/
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 799 at r11 (raw file):
nullptr /* args */); // Let the address sorter figure out which one should be tried first. AddressSortingSort(this, addresses_out_->get(), "service-addresses");
Why don't we need this anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @markdroth)
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver.h, line 1 at r13 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please use C++-style comments here.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver.h, line 81 at r13 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please use a C++-style comment here as well.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 55 at r13 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
s/it the/the/
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 799 at r11 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Why don't we need this anymore?
We now only schedule the user-provided on_done
callback effectively from DecrementPendingQueries
, and that method first sorts the to-be-returned addresses.
Manually squashed commits and force pushed |
artifact build windows: #25545 |
This reverts commit 2ee7017.
The primary change here is to have
grpc_dns_lookup_ares_locked
return aunique_ptr<AresRequestInterface>
.Conversion of the remaining C-code in
grpc_ares_wrapper.cc
is left for more followups.This change is