-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Cleanup idle connection pools #16948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Add pool idle timeout timer and callbacks to ConnectionPool::ConnPoolImplBase * Add logic to set the idle timer when there are no pending or active connections and disable the timer when new activity occurs * Add logic to ClusterManagerImpl to remove pools after their idle timeout expires and to remove hosts from the pool maps when the map is empty after an idle timer expiry Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
Originally the idle check used the lack of active streams as the start of the idle timer; however, this is too complicated. Instead, we now start the timeout after the last connection has been drained from the pool. Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
…/clean-up-connection-pools Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
…/clean-up-connection-pools Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
…/clean-up-connection-pools Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
…/clean-up-connection-pools Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
…/clean-up-connection-pools Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
* Fix original conn pool logic and add test * Ensure that the idle timer is disabled when draining * Drain connections after the idle timer fires but before erasing the pool * Make logic consistent beween conn_pool_base and original_conn_pool Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
* Added note about connection pool lifetime * Added missing note about pools per downstream connection Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
) Clarify that the server may not resend resources upon stream restart only if it has some way to know that the client is not subscribing to new resources that it wasn't previously subscribed to. Risk Level: Low Testing: N/A Docs Changes: Included in PR Clarifies text added in envoyproxy#12580. Signed-off-by: Mark D. Roth <roth@google.com> Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
…x a typo (envoyproxy#13696) The comment in v3 version was missing the "If specified.." clause from the v2 version of that comment Risk Level: low Testing: Ran ./ci/run_envoy_docker.sh './ci/do_ci.sh fix_format' Docs Changes: comment in a proto file changed Signed-off-by: Sanjay Pujare <sanjaypujare@google.com> Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
Signed-off-by: Piotr Sikora <piotrsikora@google.com> Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
This updates the defaut container config to: - use the envoyproxy.io website as the default upstream proxy - listen for admin interface on 0.0.0.0 address Fixes envoyproxy#13497 Signed-off-by: Ryan Northey <ryan@synca.io> Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
* test: Check in all TLS test certs - Will prevent openssl fork-emulation issues on Windows/msys2 that cause test flakiness - modifies context_impl_test to no longer requires a cert that is generated on the fly to expire in 15 days Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com> Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
* Added more documentation * Added additional test Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
* Replaced addIdleTimeoutCallback/addDrainedCallback with addIdleCallback with a parameter to indicate whether we should start draining * Removed the pool idle timer, instead relying on the connection idle timeouts to empty the pool Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
…/clean-up-connection-pools Signed-off-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
/retest |
Retrying Azure Pipelines: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. 🤞
Per offline discussion we can merge to get a bit of bake time before release and @alyssawilk can follow up on Monday with any post merge comments.
I'll try to canary this today so we can provide some signal/feedback. |
Thanks! |
If a host is transitioned to unhealthy, and closing its idle connections results in the pool becoming idle, it would trigger this crash. Signed-off-by: Greg Greenway <ggreenway@apple.com>
Summary of actions/changes: - Update bazel/repositories - Other files already in sync - Update AddDrainedCallback calls to instead be AddIdleCallback, due to [envoy PR#16948](envoyproxy/envoy#16948). AddIdleCallback serves an equivalent, if not exactly the same, purpose. Signed-off-by: Nathan Perry <nbperry@google.com>
We are getting crash at
Link to the full log: https://storage.googleapis.com/istio-prow/pr-logs/pull/istio_istio/33974/integ-security-multicluster-tests_istio/1415021532173832192/artifacts/security-930b7bd830c741e097d857/TestReachability/_test_context/test-ns1-1-54234-state395228267/primary/a-v1-754f5f947c-kq82l_istio-proxy.previous.log It is likely because of this change. |
@bianpengyuan I'm looking into this. Out of curiosity, why are you using the original tcp conn pool? You must be setting a runtime setting to be getting that, because it is no longer the default. Did you have issues with the new one? |
Was about to ask the same thing that Greg asked. We've been on the new connection pool for a while now (even before it was on by default in upstream), so I am curious if there are outstanding issues that we maybe missed and are causing you to use the old pool... |
Yeah we use to have some issue around new tcp pool, but I think the issue has been resolved. Let us turn off that runtime flag and see if this goes away. |
This reverts commit 3876d7c. Signed-off-by: Greg Greenway <ggreenway@apple.com>
@bianpengyuan I think we're going to revert the change for now, but it'll go back in after the next release, so any information you can gather about the crash would be helpful so that I can hopefully fix any issues before it is re-submitted. |
Delete connection pools when they have no connections anymore. This fixes unbounded memory use for cases where a new connection pool is needed for each downstream connection, such as when using upstream PROXY protocol. Fixes envoyproxy#16682 This reverts commit b7bc539. This reverts PR envoyproxy#17319, by re-adding envoyproxy#17302 and envoyproxy#16948. Signed-off-by: Greg Greenway <ggreenway@apple.com> Co-authored-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
Delete connection pools when they have no connections anymore. This fixes unbounded memory use for cases where a new connection pool is needed for each downstream connection, such as when using upstream PROXY protocol. This reverts commit b7bc539. This reverts PR #17319, by re-adding #17302 and #16948. Signed-off-by: Greg Greenway <ggreenway@apple.com> Co-authored-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
Delete connection pools when they have no connections anymore. This fixes unbounded memory use for cases where a new connection pool is needed for each downstream connection, such as when using upstream PROXY protocol. Fixes envoyproxy#16682 Signed-off-by: Greg Greenway <ggreenway@apple.com> Co-authored-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
If a host is transitioned to unhealthy, and closing its idle connections results in the pool becoming idle, it would trigger this crash. Signed-off-by: Greg Greenway <ggreenway@apple.com>
Delete connection pools when they have no connections anymore. This fixes unbounded memory use for cases where a new connection pool is needed for each downstream connection, such as when using upstream PROXY protocol. This reverts commit b7bc539. This reverts PR envoyproxy#17319, by re-adding envoyproxy#17302 and envoyproxy#16948. Signed-off-by: Greg Greenway <ggreenway@apple.com> Co-authored-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
Delete connection pools when they have no connections anymore. This fixes unbounded memory use for cases where a new connection pool is needed for each downstream connection, such as when using upstream PROXY protocol.
Fixes #16682
Continues @chradcliffe's work from #13061
Risk Level: High! Connection pools used to live until the cluster was destroyed. If anything is caching a pool that now gets deleted sooner, this could crash, corrupt memory, etc.
Testing: Existing and new tests pass
Docs Changes: Updated
Release Notes: Added
Platform Specific Features: None
Runtime guard:
envoy.reloadable_features.conn_pool_delete_when_idle
[Optional Deprecated:]
[Optional API Considerations:]