Upstream: re-resolvable servers #208

bavshin-f5 · 2024-09-23T21:34:23Z

v3 of the patchset, previously submitted in https://mailman.nginx.org/pipermail/nginx-devel/2024-July/MYXA5AVMTIMDNEVMA462AN7CRQ546Y7L.html

This PR adds to the open-source code DNS re-resolve and service discovery functionality from NGINX Plus.

The documentation is already available online and will be updated to reflect the availability of the directives in the open-source product after the release:

resolve and service parameters of the server directive in the HTTP upstream block.
resolver and resolver_timeout directives in the HTTP upstream block.
Matching sections of the stream upstream block documentation: resolver, resolver_timeout and server

For a higher level overview see Using DNS for Service Discovery with NGINX and NGINX Plus.

arut · 2024-10-23T15:16:02Z

src/http/modules/ngx_http_upstream_least_conn_module.c

+
+#if (NGX_HTTP_UPSTREAM_ZONE)
+        if (peers->config && rrp->config != *peers->config) {
+            goto busy;
+        }
+#endif


This block makes no sense, since busy is right next to it. Also, lock/unlock can be avoided as well. We can just call ngx_http_upstream_get_least_conn_peer() recursively and return its result. The nested call will assign the same pc->name.

@@ -103,7 +103,7 @@ ngx_http_upstream_get_least_conn_peer(ngx_peer_connection_t *pc, void *data) time_t now; uintptr_t m; - ngx_int_t rc, total; + ngx_int_t total; ngx_uint_t i, n, p, many; ngx_http_upstream_rr_peer_t *peer, *best; ngx_http_upstream_rr_peers_t *peers; @@ -278,19 +278,7 @@ failed: ngx_http_upstream_rr_peers_unlock(peers); - rc = ngx_http_upstream_get_least_conn_peer(pc, rrp); - - if (rc != NGX_BUSY) { - return rc; - } - - ngx_http_upstream_rr_peers_wlock(peers); - -#if (NGX_HTTP_UPSTREAM_ZONE) - if (peers->config && rrp->config != *peers->config) { - goto busy; - } -#endif + return ngx_http_upstream_get_least_conn_peer(pc, rrp); } #if (NGX_HTTP_UPSTREAM_ZONE)

I could swear I dropped this goto busy after your review in July. Anyways, fixed. Thanks for catching!

Regarding the lock/unlock optimization: it is a preexisting issue introduced to reduce the diff with our other repository. I can submit another PR addressing that, but I believe it would only complicate merges.

Indeed this part is the existing code. Agreed. Maybe later.

src/http/ngx_http_upstream_round_robin.c

src/http/modules/ngx_http_upstream_zone_module.c

arut · 2024-10-25T11:38:39Z

src/http/ngx_http_upstream_round_robin.h

+    ngx_event_t                     event;         /* must be first */
+    ngx_uint_t                      worker;


As was mentioned before, these fields are not needed in nginx open source. But I agree, at this point it's easier to keep them here than to add a compat section. Having said that, I think allocating an ngx_event_t object in a shared zone is not good anyway. We can later think about making it worker-local and removing it from here. Seems pretty simple.

Also, there's one case when the shared event leads to an actual segfault. On Windows, when more than 1 worker is configured, all those workers have ngx_worker == 0, which allows them to reuse the event object, which leads to a crash. Of course, having more than 1 worker on Windows makes no sense since only one of them works, but still the problem is there, and removing the event from shared zone will solve it.

I'm quite aware of the issue, and it's not that simple. Moving the event to worker-local memory doesn't solve that as long as there is an event and a pointer to it. Getting rid of the per-host ngx_event_t seems to be the best option, but it requires redoing the whole scheduling of resolve tasks and we still want to maintain some kind of worker affinity to be able to use ngx_resolver_t caches (that are not shared at the moment). At this point it becomes easier to add ngx_worker support on Windows :)

See my second (or third?) attempt to rework that: https://mailman.nginx.org/pipermail/nginx-devel/2023-February/NQSIPFNPTD4CDOLTGOHDWIMIHQ6DQY6S.html.

arut · 2024-10-25T12:59:20Z

src/http/modules/ngx_http_upstream_least_conn_module.c

+
+#if (NGX_HTTP_UPSTREAM_ZONE)
+        if (peers->config && rrp->config != *peers->config) {
+            goto busy;
+        }
+#endif


Indeed this part is the existing code. Agreed. Maybe later.

src/stream/ngx_stream_proxy_module.c

src/http/ngx_http_upstream_round_robin.h

arut

Other than the style comment above, the changes look ok.

Specifying the upstream server by a hostname together with the "resolve" parameter will make the hostname to be periodically resolved, and upstream servers added/removed as necessary. This requires a "resolver" at the "http" configuration block. The "resolver_timeout" parameter also affects when the failed DNS requests will be attempted again. Responses with NXDOMAIN will be attempted again in 10 seconds. Upstream has a configuration generation number that is incremented each time servers are added/removed to the primary/backup list. This number is remembered by the peer.init method, and if peer.get detects a change in configuration, it returns NGX_BUSY. Each server has a reference counter. It is incremented by peer.get and decremented by peer.free. When a server is removed, it is removed from the list of servers and is marked as "zombie". The memory allocated by a zombie peer is freed only when its reference count becomes zero. Co-authored-by: Roman Arutyunyan <arut@nginx.com> Co-authored-by: Sergey Kandaurov <pluknet@nginx.com> Co-authored-by: Vladimir Homutov <vl@nginx.com>

When re-creating a non-reusable zone, make the pointer to the old zone available during the new zone initialization.

After configuration is reloaded, it may take some time for the re-resolvable upstream servers to resolve and become available as peers. During this time, client requests might get dropped. Such servers are now pre-resolved using the "cache" of already resolved peers from the old shared memory zone.

The "resolver" and "resolver_timeout" directives can now be specified directly in the "upstream" block.

Previously, all upstream DNS entries would be immediately re-resolved on config reload. With a large number of upstreams, this creates a spike of DNS resolution requests. These spikes can overwhelm the DNS server or cause drops on the network. This patch retains the TTL of previous resolutions across reloads by copying each upstream's name's expiry time across configuration cycles. As a result, no additional resolutions are needed.

arut self-assigned this Sep 24, 2024

bavshin-f5 mentioned this pull request Oct 21, 2024

Tests: upstream configuration tests with re-resolvable servers. nginx/nginx-tests#8

Merged

arut reviewed Oct 23, 2024

View reviewed changes

src/http/modules/ngx_http_upstream_zone_module.c Outdated Show resolved Hide resolved

bavshin-f5 force-pushed the upstream-resolve branch from 5ae8dae to 9116a4a Compare October 23, 2024 22:06

arut reviewed Oct 28, 2024

View reviewed changes

bavshin-f5 force-pushed the upstream-resolve branch from 9116a4a to 56fd192 Compare October 28, 2024 15:53

arut reviewed Oct 30, 2024

View reviewed changes

src/http/ngx_http_upstream_round_robin.h Outdated Show resolved Hide resolved

arut approved these changes Oct 31, 2024

View reviewed changes

mdocguard and others added 6 commits October 31, 2024 08:36

Upstream: construct upstream peers from DNS SRV records.

c297169

Core: inheritance of non-reusable shared memory zones.

8d8e572

When re-creating a non-reusable zone, make the pointer to the old zone available during the new zone initialization.

Upstream: per-upstream resolver.

a26b65a

The "resolver" and "resolver_timeout" directives can now be specified directly in the "upstream" block.

bavshin-f5 force-pushed the upstream-resolve branch from 56fd192 to f94565b Compare October 31, 2024 15:40

bavshin-f5 merged commit 29aec57 into nginx:master Nov 7, 2024
1 check passed

bavshin-f5 deleted the upstream-resolve branch November 7, 2024 15:57

Maryna-f5 added this to the nginx-1.27.3 milestone Nov 19, 2024

bavshin-f5 mentioned this pull request Dec 7, 2024

Upstream: remove redundant lock in least_conn. #373

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Upstream: re-resolvable servers #208

Upstream: re-resolvable servers #208

Uh oh!

bavshin-f5 commented Sep 23, 2024 •

edited

Loading

Uh oh!

arut Oct 23, 2024 •

edited

Loading

Uh oh!

bavshin-f5 Oct 23, 2024

Uh oh!

arut Oct 25, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arut Oct 25, 2024

Uh oh!

bavshin-f5 Oct 28, 2024

Uh oh!

arut Oct 25, 2024

Uh oh!

Uh oh!

Uh oh!

arut left a comment

Uh oh!

Uh oh!

Uh oh!

Upstream: re-resolvable servers #208

Upstream: re-resolvable servers #208

Uh oh!

Conversation

bavshin-f5 commented Sep 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arut Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bavshin-f5 Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

arut Oct 25, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arut Oct 25, 2024

Choose a reason for hiding this comment

Uh oh!

bavshin-f5 Oct 28, 2024

Choose a reason for hiding this comment

Uh oh!

arut Oct 25, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

arut left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bavshin-f5 commented Sep 23, 2024 •

edited

Loading

arut Oct 23, 2024 •

edited

Loading