Cleanup idle connection pools #16948

ggreenway · 2021-06-11T23:06:52Z

Delete connection pools when they have no connections anymore. This fixes unbounded memory use for cases where a new connection pool is needed for each downstream connection, such as when using upstream PROXY protocol.

Signed-off-by: Greg Greenway <ggreenway@apple.com>
Co-authored-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Fixes #16682
Continues @chradcliffe's work from #13061

Risk Level: High! Connection pools used to live until the cluster was destroyed. If anything is caching a pool that now gets deleted sooner, this could crash, corrupt memory, etc.
Testing: Existing and new tests pass
Docs Changes: Updated
Release Notes: Added
Platform Specific Features: None
Runtime guard: envoy.reloadable_features.conn_pool_delete_when_idle
[Optional Deprecated:]
[Optional API Considerations:]

* Add pool idle timeout timer and callbacks to ConnectionPool::ConnPoolImplBase * Add logic to set the idle timer when there are no pending or active connections and disable the timer when new activity occurs * Add logic to ClusterManagerImpl to remove pools after their idle timeout expires and to remove hosts from the pool maps when the map is empty after an idle timer expiry Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Originally the idle check used the lack of active streams as the start of the idle timer; however, this is too complicated. Instead, we now start the timeout after the last connection has been drained from the pool. Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

…/clean-up-connection-pools Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

…/clean-up-connection-pools Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

…/clean-up-connection-pools Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

* Fix original conn pool logic and add test * Ensure that the idle timer is disabled when draining * Drain connections after the idle timer fires but before erasing the pool * Make logic consistent beween conn_pool_base and original_conn_pool Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

* Added note about connection pool lifetime * Added missing note about pools per downstream connection Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

) Clarify that the server may not resend resources upon stream restart only if it has some way to know that the client is not subscribing to new resources that it wasn't previously subscribed to. Risk Level: Low Testing: N/A Docs Changes: Included in PR Clarifies text added in envoyproxy#12580. Signed-off-by: Mark D. Roth <roth@google.com> Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

…x a typo (envoyproxy#13696) The comment in v3 version was missing the "If specified.." clause from the v2 version of that comment Risk Level: low Testing: Ran ./ci/run_envoy_docker.sh './ci/do_ci.sh fix_format' Docs Changes: comment in a proto file changed Signed-off-by: Sanjay Pujare <sanjaypujare@google.com> Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Signed-off-by: Piotr Sikora <piotrsikora@google.com> Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

This updates the defaut container config to: - use the envoyproxy.io website as the default upstream proxy - listen for admin interface on 0.0.0.0 address Fixes envoyproxy#13497 Signed-off-by: Ryan Northey <ryan@synca.io> Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

* test: Check in all TLS test certs - Will prevent openssl fork-emulation issues on Windows/msys2 that cause test flakiness - modifies context_impl_test to no longer requires a cert that is generated on the fly to expire in 15 days Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com> Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

* Added more documentation * Added additional test Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

* Replaced addIdleTimeoutCallback/addDrainedCallback with addIdleCallback with a parameter to indicate whether we should start draining * Removed the pool idle timer, instead relying on the connection idle timeouts to empty the pool Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

…/clean-up-connection-pools Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Signed-off-by: Greg Greenway <ggreenway@apple.com>

ggreenway · 2021-07-08T23:25:32Z

/retest

repokitteh-read-only · 2021-07-08T23:25:36Z

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #16948 (comment) was created by @ggreenway.

see: more, trace.

mattklein123

LGTM. 🤞

Per offline discussion we can merge to get a bit of bake time before release and @alyssawilk can follow up on Monday with any post merge comments.

rgs1 · 2021-07-09T18:58:08Z

I'll try to canary this today so we can provide some signal/feedback.

ggreenway · 2021-07-09T22:52:34Z

I'll try to canary this today so we can provide some signal/feedback.

Thanks!

If a host is transitioned to unhealthy, and closing its idle connections results in the pool becoming idle, it would trigger this crash. Signed-off-by: Greg Greenway <ggreenway@apple.com>

Summary of actions/changes: - Update bazel/repositories - Other files already in sync - Update AddDrainedCallback calls to instead be AddIdleCallback, due to [envoy PR#16948](envoyproxy/envoy#16948). AddIdleCallback serves an equivalent, if not exactly the same, purpose. Signed-off-by: Nathan Perry <nbperry@google.com>

bianpengyuan · 2021-07-13T20:15:43Z

We are getting crash at checkForIdleAndCloseIdleConnsIfDraining after upgrading to the latest envoy at Istio, including #17302:

2021-07-13T19:21:36.345453Z	critical	envoy backtrace	Caught Segmentation fault, suspect faulting address 0x0
2021-07-13T19:21:36.345502Z	critical	envoy backtrace	Backtrace (use tools/stack_decode.py to get line numbers):
2021-07-13T19:21:36.345506Z	critical	envoy backtrace	Envoy version: 45fb150aa722b5c4bbbf58af5d59910ca08010f0/1.19.0-dev/Clean/RELEASE/BoringSSL
2021-07-13T19:21:36.345764Z	critical	envoy backtrace	#0: __restore_rt [0x7f0819a3f3c0]
2021-07-13T19:21:36.360093Z	critical	envoy backtrace	#1: std::__1::__function::__func<>::operator()() [0x55d76850c7ba]
2021-07-13T19:21:36.373821Z	critical	envoy backtrace	#2: Envoy::Tcp::OriginalConnPoolImpl::checkForIdleAndCloseIdleConnsIfDraining() [0x55d7686e3323]
2021-07-13T19:21:36.387629Z	critical	envoy backtrace	#3: Envoy::Tcp::OriginalConnPoolImpl::onConnectionEvent() [0x55d7686e4b05]
2021-07-13T19:21:36.401213Z	critical	envoy backtrace	#4: Envoy::Tcp::OriginalConnPoolImpl::ActiveConn::onEvent() [0x55d7686e7cfa]
2021-07-13T19:21:36.415301Z	critical	envoy backtrace	#5: Envoy::Network::ConnectionImplBase::raiseConnectionEvent() [0x55d768aa942b]
2021-07-13T19:21:36.428448Z	critical	envoy backtrace	#6: Envoy::Network::ConnectionImpl::raiseEvent() [0x55d768aa015f]
2021-07-13T19:21:36.444259Z	critical	envoy backtrace	#7: Envoy::Network::ConnectionImpl::closeSocket() [0x55d768a9fdfb]
2021-07-13T19:21:36.457692Z	critical	envoy backtrace	#8: Envoy::Network::ConnectionImpl::onReadReady() [0x55d768aa4452]
2021-07-13T19:21:36.470884Z	critical	envoy backtrace	#9: Envoy::Network::ConnectionImpl::onFileEvent() [0x55d768aa203f]
2021-07-13T19:21:36.484227Z	critical	envoy backtrace	#10: std::__1::__function::__func<>::operator()() [0x55d768a89181]
2021-07-13T19:21:36.499012Z	critical	envoy backtrace	#11: Envoy::Event::FileEventImpl::assignEvents()::$_1::__invoke() [0x55d768a8a42c]

Link to the full log: https://storage.googleapis.com/istio-prow/pr-logs/pull/istio_istio/33974/integ-security-multicluster-tests_istio/1415021532173832192/artifacts/security-930b7bd830c741e097d857/TestReachability/_test_context/test-ns1-1-54234-state395228267/primary/a-v1-754f5f947c-kq82l_istio-proxy.previous.log It is likely because of this change.

ggreenway · 2021-07-13T20:50:37Z

@bianpengyuan I'm looking into this. Out of curiosity, why are you using the original tcp conn pool? You must be setting a runtime setting to be getting that, because it is no longer the default. Did you have issues with the new one?

rgs1 · 2021-07-13T20:52:14Z

Was about to ask the same thing that Greg asked. We've been on the new connection pool for a while now (even before it was on by default in upstream), so I am curious if there are outstanding issues that we maybe missed and are causing you to use the old pool...

bianpengyuan · 2021-07-13T20:52:33Z

Yeah we use to have some issue around new tcp pool, but I think the issue has been resolved. Let us turn off that runtime flag and see if this goes away.

…xy#17302)" This reverts commit 3c266bb. Signed-off-by: Greg Greenway <ggreenway@apple.com>

This reverts commit 3876d7c. Signed-off-by: Greg Greenway <ggreenway@apple.com>

ggreenway · 2021-07-13T21:09:55Z

@bianpengyuan I think we're going to revert the change for now, but it'll go back in after the next release, so any information you can gather about the crash would be helpful so that I can hopefully fix any issues before it is re-submitted.

Delete connection pools when they have no connections anymore. This fixes unbounded memory use for cases where a new connection pool is needed for each downstream connection, such as when using upstream PROXY protocol. Fixes envoyproxy#16682 This reverts commit b7bc539. This reverts PR envoyproxy#17319, by re-adding envoyproxy#17302 and envoyproxy#16948. Signed-off-by: Greg Greenway <ggreenway@apple.com> Co-authored-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Delete connection pools when they have no connections anymore. This fixes unbounded memory use for cases where a new connection pool is needed for each downstream connection, such as when using upstream PROXY protocol. This reverts commit b7bc539. This reverts PR #17319, by re-adding #17302 and #16948. Signed-off-by: Greg Greenway <ggreenway@apple.com> Co-authored-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Delete connection pools when they have no connections anymore. This fixes unbounded memory use for cases where a new connection pool is needed for each downstream connection, such as when using upstream PROXY protocol. Fixes envoyproxy#16682 Signed-off-by: Greg Greenway <ggreenway@apple.com> Co-authored-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

If a host is transitioned to unhealthy, and closing its idle connections results in the pool becoming idle, it would trigger this crash. Signed-off-by: Greg Greenway <ggreenway@apple.com>

Delete connection pools when they have no connections anymore. This fixes unbounded memory use for cases where a new connection pool is needed for each downstream connection, such as when using upstream PROXY protocol. This reverts commit b7bc539. This reverts PR envoyproxy#17319, by re-adding envoyproxy#17302 and envoyproxy#16948. Signed-off-by: Greg Greenway <ggreenway@apple.com> Co-authored-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Craig Radcliffe and others added 30 commits September 11, 2020 13:27

Fix spacing issue in protobuf comment

31495b5

Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Fix conn pool Request/Stream renaming in test

c8de788

Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Change the return value of poolIdleTiemout to absl::optional

ac59364

Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Fix original conn pool logic and add test

4bcd4c3

Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Merge remote-tracking branch 'envoy-upstream/master' into chradcliffe…

80358a6

…/clean-up-connection-pools Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Merge remote-tracking branch 'envoy-upstream/master' into chradcliffe…

53cc20c

…/clean-up-connection-pools Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Re-split "drained" and "idle" and refine

2a97b7c

Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Merge remote-tracking branch 'envoy-upstream/master' into chradcliffe…

b8d87dc

…/clean-up-connection-pools Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Merge remote-tracking branch 'envoy-upstream/master' into chradcliffe…

5584ee3

…/clean-up-connection-pools Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Change the return value of poolIdleTiemout to absl::optional

00dcf3f

Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Merge remote-tracking branch 'envoy-upstream/master' into chradcliffe…

7c91b1c

…/clean-up-connection-pools Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Adding additional documentation for connection pools

f71bb91

* Added note about connection pool lifetime * Added missing note about pools per downstream connection Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

build: update zlib-ng to latest. (envoyproxy#13642)

2dbef28

Signed-off-by: Piotr Sikora <piotrsikora@google.com> Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Merging from side branch

05bbefe

* Added more documentation * Added additional test Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Fixing stale comment

6e1594f

Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Fixing clang-tidy issues

ecb5fa6

Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Kick CI

599fecd

Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Kick CI

06aadcf

Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Kick CI

c03619b

Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Removing pool_idle_timeout in favour of erase_idle_pools.

822eb59

Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Updating documentation

f991572

Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Merge remote-tracking branch 'envoy-upstream/master' into chradcliffe…

8836e15

…/clean-up-connection-pools Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>

Simplify logic, and fix test assumptions

3eba970

Signed-off-by: Greg Greenway <ggreenway@apple.com>

ggreenway dismissed mattklein123’s stale review via 3eba970 July 8, 2021 19:57

repokitteh-read-only bot removed the waiting label Jul 8, 2021

mattklein123 approved these changes Jul 9, 2021

View reviewed changes

ggreenway merged commit 3876d7c into envoyproxy:main Jul 9, 2021

dubious90 mentioned this pull request Jul 12, 2021

Update Envoy to 5ab509b (July 12th 2021) envoyproxy/nighthawk#712

Merged

bianpengyuan mentioned this pull request Jul 13, 2021

Bump dependencies istio/istio#33974

Merged

ggreenway added a commit to ggreenway/envoy that referenced this pull request Jul 13, 2021

Revert "conn pool: fix crash introduced in envoyproxy#16948 (envoypro…

52dae76

…xy#17302)" This reverts commit 3c266bb. Signed-off-by: Greg Greenway <ggreenway@apple.com>

ggreenway added a commit to ggreenway/envoy that referenced this pull request Jul 13, 2021

Revert "Cleanup idle connection pools (envoyproxy#16948)"

2d344f5

This reverts commit 3876d7c. Signed-off-by: Greg Greenway <ggreenway@apple.com>

ggreenway mentioned this pull request Jul 13, 2021

Revert connection pool cleanup #17319

Merged

ggreenway mentioned this pull request Jul 19, 2021

Cleanup idle connection pools (disabled by default) #17403

Merged

ggreenway mentioned this pull request Aug 3, 2021

conn pool: default enable runtime feature conn_pool_delete_when_idle #17577

Merged

denniskniep mentioned this pull request Apr 4, 2024

Support one-to-one mapping of upstream/downstream connections #12370

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cleanup idle connection pools #16948

Cleanup idle connection pools #16948

Uh oh!

ggreenway commented Jun 11, 2021 •

edited

Loading

Uh oh!

ggreenway commented Jul 8, 2021

Uh oh!

repokitteh-read-only bot commented Jul 8, 2021

Uh oh!

mattklein123 left a comment

Uh oh!

rgs1 commented Jul 9, 2021

Uh oh!

ggreenway commented Jul 9, 2021

Uh oh!

bianpengyuan commented Jul 13, 2021

Uh oh!

ggreenway commented Jul 13, 2021

Uh oh!

rgs1 commented Jul 13, 2021

Uh oh!

bianpengyuan commented Jul 13, 2021

Uh oh!

ggreenway commented Jul 13, 2021

Uh oh!

Uh oh!

Cleanup idle connection pools #16948

Cleanup idle connection pools #16948

Uh oh!

Conversation

ggreenway commented Jun 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggreenway commented Jul 8, 2021

Uh oh!

repokitteh-read-only bot commented Jul 8, 2021

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

rgs1 commented Jul 9, 2021

Uh oh!

ggreenway commented Jul 9, 2021

Uh oh!

bianpengyuan commented Jul 13, 2021

Uh oh!

ggreenway commented Jul 13, 2021

Uh oh!

rgs1 commented Jul 13, 2021

Uh oh!

bianpengyuan commented Jul 13, 2021

Uh oh!

ggreenway commented Jul 13, 2021

Uh oh!

Uh oh!

ggreenway commented Jun 11, 2021 •

edited

Loading