Skip to content

[#2352] Add configuration server.serverSession to allow expiring server conne… #2353

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 6, 2019

Conversation

robertpanzer
Copy link
Contributor

…ctions

This PR fixes #2352.

There might be clients opening connections to Linkerd not closing them properly.
Also sometimes network components might silently drop connections to Linkerd.
This results in connections that are never closed and after a longer time this might accumulate to so many open connections until the Linkerd process runs out of file handles.
Right now I could not find a configuration parameter to control expiring of inbound connections like it is supported for client connections.

Therefore this PR proposes to add a configuration parameter serverSession to a server that behaves similar to the clientSession.
With the following configuration server connections will be closed after 2 seconds of idling or after a full life time of 30 seconds:

servers:
- port: 4140
  serverSession:
    idleTimeMs: 2000
    lifeTimeMs: 30000

…ctions

Signed-off-by: Robert Panzer <robert.panzer.pb@googlemail.com>
@zaharidichev
Copy link
Member

@robertpanzer

The reasoning behind this is solid. However, I do wonder whether this could be achieved with setting socket options on the server. You can read a bit more about it right here. Let me know if that helps?

@robertpanzer
Copy link
Contributor Author

The only option that I see that could help in such a scenario would be enabling SO_KEEPALIVE.
But as far as I can tell configuring this properly also requires changing the system settings as explained here: https://www.tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/
And this would be where it gets tricky, in large heterogenous networks it's not really feasible to configure this on that level.

Copy link
Member

@adleong adleong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks @robertpanzer!

I tested this manually by running a Linkerd locally with the server session options and initiating connections with nc and watching how long they took to idle out or hit their maximum lifetime. Everything worked as expected.


serverIdleTimeMsBaseTest(config){ (router:Router.Initialized, stats:InMemoryStatsReceiver) =>
// Assert
def connectsCount = () => stats.counters(Seq("rt", "http", "server", "127.0.0.1/0", "connects"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be a closure, I think it can simply be a def, similarly to how metrics are checked elsewhere in this file:

def connectsCount = stats.counters(Seq("rt", "http", "server", "127.0.0.1/0", "connects"))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adleong!
I updated the PR.
CircleCI seems to have failed on downloading a zookeeper dependency, could you please trigger a rebuild?

Copy link
Member

@zaharidichev zaharidichev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!!! This can be merged once you fix the DCO problem that you can read more about here

…Test

Signed-off-by: Robert Panzer <robert.panzer.pb@googlemail.com>
@robertpanzer
Copy link
Contributor Author

Thanks!
Oh yes, just realized I forgot to sign the second commit.
I just updated that commit with the sign off message and force-pushed.

@adleong
Copy link
Member

adleong commented Nov 26, 2019

Sorry that it's taking us so long to get this merged, @robertpanzer! The CI failure here is also happening on master. I am looking into it.

@robertpanzer
Copy link
Contributor Author

Sure, thanks for letting me know!

@adleong
Copy link
Member

adleong commented Dec 6, 2019

The build issue on master has been resolved and this is good to go! Thanks for your patience, @robertpanzer!

@adleong adleong merged commit 7ef7f25 into linkerd:master Dec 6, 2019
@robertpanzer robertpanzer deleted the server-session branch December 7, 2019 08:10
@robertpanzer
Copy link
Contributor Author

\o/ Awesome! Thanks for merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Impossible to time out server connections
3 participants