-
Notifications
You must be signed in to change notification settings - Fork 7.5k
Upstream: re-resolvable servers #208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
#if (NGX_HTTP_UPSTREAM_ZONE) | ||
if (peers->config && rrp->config != *peers->config) { | ||
goto busy; | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block makes no sense, since busy
is right next to it. Also, lock/unlock can be avoided as well. We can just call ngx_http_upstream_get_least_conn_peer()
recursively and return its result. The nested call will assign the same pc->name
.
@@ -103,7 +103,7 @@ ngx_http_upstream_get_least_conn_peer(ngx_peer_connection_t *pc, void *data)
time_t now;
uintptr_t m;
- ngx_int_t rc, total;
+ ngx_int_t total;
ngx_uint_t i, n, p, many;
ngx_http_upstream_rr_peer_t *peer, *best;
ngx_http_upstream_rr_peers_t *peers;
@@ -278,19 +278,7 @@ failed:
ngx_http_upstream_rr_peers_unlock(peers);
- rc = ngx_http_upstream_get_least_conn_peer(pc, rrp);
-
- if (rc != NGX_BUSY) {
- return rc;
- }
-
- ngx_http_upstream_rr_peers_wlock(peers);
-
-#if (NGX_HTTP_UPSTREAM_ZONE)
- if (peers->config && rrp->config != *peers->config) {
- goto busy;
- }
-#endif
+ return ngx_http_upstream_get_least_conn_peer(pc, rrp);
}
#if (NGX_HTTP_UPSTREAM_ZONE)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could swear I dropped this goto busy after your review in July. Anyways, fixed. Thanks for catching!
Regarding the lock/unlock optimization: it is a preexisting issue introduced to reduce the diff with our other repository. I can submit another PR addressing that, but I believe it would only complicate merges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed this part is the existing code. Agreed. Maybe later.
5ae8dae
to
9116a4a
Compare
ngx_event_t event; /* must be first */ | ||
ngx_uint_t worker; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As was mentioned before, these fields are not needed in nginx open source. But I agree, at this point it's easier to keep them here than to add a compat section. Having said that, I think allocating an ngx_event_t
object in a shared zone is not good anyway. We can later think about making it worker-local and removing it from here. Seems pretty simple.
Also, there's one case when the shared event leads to an actual segfault. On Windows, when more than 1 worker is configured, all those workers have ngx_worker == 0
, which allows them to reuse the event object, which leads to a crash. Of course, having more than 1 worker on Windows makes no sense since only one of them works, but still the problem is there, and removing the event from shared zone will solve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quite aware of the issue, and it's not that simple. Moving the event to worker-local memory doesn't solve that as long as there is an event and a pointer to it. Getting rid of the per-host ngx_event_t seems to be the best option, but it requires redoing the whole scheduling of resolve tasks and we still want to maintain some kind of worker affinity to be able to use ngx_resolver_t caches (that are not shared at the moment). At this point it becomes easier to add ngx_worker
support on Windows :)
See my second (or third?) attempt to rework that: https://mailman.nginx.org/pipermail/nginx-devel/2023-February/NQSIPFNPTD4CDOLTGOHDWIMIHQ6DQY6S.html.
|
||
#if (NGX_HTTP_UPSTREAM_ZONE) | ||
if (peers->config && rrp->config != *peers->config) { | ||
goto busy; | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed this part is the existing code. Agreed. Maybe later.
9116a4a
to
56fd192
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the style comment above, the changes look ok.
Specifying the upstream server by a hostname together with the "resolve" parameter will make the hostname to be periodically resolved, and upstream servers added/removed as necessary. This requires a "resolver" at the "http" configuration block. The "resolver_timeout" parameter also affects when the failed DNS requests will be attempted again. Responses with NXDOMAIN will be attempted again in 10 seconds. Upstream has a configuration generation number that is incremented each time servers are added/removed to the primary/backup list. This number is remembered by the peer.init method, and if peer.get detects a change in configuration, it returns NGX_BUSY. Each server has a reference counter. It is incremented by peer.get and decremented by peer.free. When a server is removed, it is removed from the list of servers and is marked as "zombie". The memory allocated by a zombie peer is freed only when its reference count becomes zero. Co-authored-by: Roman Arutyunyan <arut@nginx.com> Co-authored-by: Sergey Kandaurov <pluknet@nginx.com> Co-authored-by: Vladimir Homutov <vl@nginx.com>
When re-creating a non-reusable zone, make the pointer to the old zone available during the new zone initialization.
After configuration is reloaded, it may take some time for the re-resolvable upstream servers to resolve and become available as peers. During this time, client requests might get dropped. Such servers are now pre-resolved using the "cache" of already resolved peers from the old shared memory zone.
The "resolver" and "resolver_timeout" directives can now be specified directly in the "upstream" block.
Previously, all upstream DNS entries would be immediately re-resolved on config reload. With a large number of upstreams, this creates a spike of DNS resolution requests. These spikes can overwhelm the DNS server or cause drops on the network. This patch retains the TTL of previous resolutions across reloads by copying each upstream's name's expiry time across configuration cycles. As a result, no additional resolutions are needed.
56fd192
to
f94565b
Compare
v3 of the patchset, previously submitted in https://mailman.nginx.org/pipermail/nginx-devel/2024-July/MYXA5AVMTIMDNEVMA462AN7CRQ546Y7L.html
This PR adds to the open-source code DNS re-resolve and service discovery functionality from NGINX Plus.
The documentation is already available online and will be updated to reflect the availability of the directives in the open-source product after the release:
resolve
andservice
parameters of theserver
directive in the HTTPupstream
block.resolver
andresolver_timeout
directives in the HTTPupstream
block.upstream
block documentation:resolver
,resolver_timeout
andserver
For a higher level overview see Using DNS for Service Discovery with NGINX and NGINX Plus.