-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Proxy: Do not release acknowledged proxy ports when endpoint regeneration fails #36142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proxy: Do not release acknowledged proxy ports when endpoint regeneration fails #36142
Conversation
/test |
/test |
3658d6e
to
2f4bc6d
Compare
/test |
/test |
93dddbb
to
b54f3f8
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved for @cilium/cli
Take a proxy port reference for new redirects as soon as they are created, rather than when an acknowledgement is received. This fixes the case where a delayed release of the proxy port takes effect after a proxy redirect has been created but not acknowledged yet, signified by these error logs seen in the CI: msg="Created new proxy instance" ProxyPort=19659 id="2424:egress:TCP:64:" l7parser=http ... msg="Delayed release of proxy port 19659" id=cilium-http-egress ... msg="Datapath proxy redirection cannot be enabled, L7 proxy may be bypassed" error="ackProxyPort: zero port on cilium-http-egress not allowed" id="2424:egress:TCP:4096:" l7parser=http ... This change makes it necessary to manage the acknowledgement status of a proxy port separately from the 'nRedirects' field. Add a new 'acknowledged' field for this purpose. Fixes: cilium#36142 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Take a proxy port reference for new redirects as soon as they are created, rather than when an acknowledgment is received. This fixes the case where a delayed release of the proxy port takes effect after a proxy redirect has been created but not acknowledged yet, signified by these error logs seen in the CI: msg="Created new proxy instance" ProxyPort=19659 id="2424:egress:TCP:64:" l7parser=http ... msg="Delayed release of proxy port 19659" id=cilium-http-egress ... msg="Datapath proxy redirection cannot be enabled, L7 proxy may be bypassed" error="ackProxyPort: zero port on cilium-http-egress not allowed" id="2424:egress:TCP:4096:" l7parser=http ... This change makes it necessary to manage the acknowledgment status of a proxy port separately from the 'nRedirects' field. Add a new 'acknowledged' field for this purpose. Fixes: cilium#36142 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Take a proxy port reference for new redirects as soon as they are created, rather than when an acknowledgment is received. This fixes the case where a delayed release of the proxy port takes effect after a proxy redirect has been created but not acknowledged yet, signified by these error logs seen in the CI: msg="Created new proxy instance" ProxyPort=19659 id="2424:egress:TCP:64:" l7parser=http ... msg="Delayed release of proxy port 19659" id=cilium-http-egress ... msg="Datapath proxy redirection cannot be enabled, L7 proxy may be bypassed" error="ackProxyPort: zero port on cilium-http-egress not allowed" id="2424:egress:TCP:4096:" l7parser=http ... This change makes it necessary to manage the acknowledgment status of a proxy port separately from the 'nRedirects' field. Add a new 'acknowledged' field for this purpose. Fixes: cilium#36142 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Take a proxy port reference for new redirects as soon as they are created, rather than when an acknowledgment is received. This fixes the case where a delayed release of the proxy port takes effect after a proxy redirect has been created but not acknowledged yet, signified by these error logs seen in the CI: msg="Created new proxy instance" ProxyPort=19659 id="2424:egress:TCP:64:" l7parser=http ... msg="Delayed release of proxy port 19659" id=cilium-http-egress ... msg="Datapath proxy redirection cannot be enabled, L7 proxy may be bypassed" error="ackProxyPort: zero port on cilium-http-egress not allowed" id="2424:egress:TCP:4096:" l7parser=http ... This change makes it necessary to manage the acknowledgment status of a proxy port separately from the 'nRedirects' field. Add a new 'acknowledged' field for this purpose. Fixes: cilium#36142 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Take a proxy port reference for new redirects as soon as they are created, rather than when an acknowledgment is received. This fixes the case where a delayed release of the proxy port takes effect after a proxy redirect has been created but not acknowledged yet, signified by these error logs seen in the CI: msg="Created new proxy instance" ProxyPort=19659 id="2424:egress:TCP:64:" l7parser=http ... msg="Delayed release of proxy port 19659" id=cilium-http-egress ... msg="Datapath proxy redirection cannot be enabled, L7 proxy may be bypassed" error="ackProxyPort: zero port on cilium-http-egress not allowed" id="2424:egress:TCP:4096:" l7parser=http ... This change makes it necessary to manage the acknowledgment status of a proxy port separately from the 'nRedirects' field. Add a new 'acknowledged' field for this purpose. Fixes: #36142 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
[ upstream commit d79ab31 ] Take a proxy port reference for new redirects as soon as they are created, rather than when an acknowledgment is received. This fixes the case where a delayed release of the proxy port takes effect after a proxy redirect has been created but not acknowledged yet, signified by these error logs seen in the CI: msg="Created new proxy instance" ProxyPort=19659 id="2424:egress:TCP:64:" l7parser=http ... msg="Delayed release of proxy port 19659" id=cilium-http-egress ... msg="Datapath proxy redirection cannot be enabled, L7 proxy may be bypassed" error="ackProxyPort: zero port on cilium-http-egress not allowed" id="2424:egress:TCP:4096:" l7parser=http ... This change makes it necessary to manage the acknowledgment status of a proxy port separately from the 'nRedirects' field. Add a new 'acknowledged' field for this purpose. Fixes: cilium#36142 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
[ upstream commit d79ab31 ] Take a proxy port reference for new redirects as soon as they are created, rather than when an acknowledgment is received. This fixes the case where a delayed release of the proxy port takes effect after a proxy redirect has been created but not acknowledged yet, signified by these error logs seen in the CI: msg="Created new proxy instance" ProxyPort=19659 id="2424:egress:TCP:64:" l7parser=http ... msg="Delayed release of proxy port 19659" id=cilium-http-egress ... msg="Datapath proxy redirection cannot be enabled, L7 proxy may be bypassed" error="ackProxyPort: zero port on cilium-http-egress not allowed" id="2424:egress:TCP:4096:" l7parser=http ... This change makes it necessary to manage the acknowledgment status of a proxy port separately from the 'nRedirects' field. Add a new 'acknowledged' field for this purpose. Fixes: cilium#36142 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
[ upstream commit d79ab31 ] Take a proxy port reference for new redirects as soon as they are created, rather than when an acknowledgment is received. This fixes the case where a delayed release of the proxy port takes effect after a proxy redirect has been created but not acknowledged yet, signified by these error logs seen in the CI: msg="Created new proxy instance" ProxyPort=19659 id="2424:egress:TCP:64:" l7parser=http ... msg="Delayed release of proxy port 19659" id=cilium-http-egress ... msg="Datapath proxy redirection cannot be enabled, L7 proxy may be bypassed" error="ackProxyPort: zero port on cilium-http-egress not allowed" id="2424:egress:TCP:4096:" l7parser=http ... This change makes it necessary to manage the acknowledgment status of a proxy port separately from the 'nRedirects' field. Add a new 'acknowledged' field for this purpose. Fixes: cilium#36142 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
[ upstream commit d79ab31 ] Take a proxy port reference for new redirects as soon as they are created, rather than when an acknowledgment is received. This fixes the case where a delayed release of the proxy port takes effect after a proxy redirect has been created but not acknowledged yet, signified by these error logs seen in the CI: msg="Created new proxy instance" ProxyPort=19659 id="2424:egress:TCP:64:" l7parser=http ... msg="Delayed release of proxy port 19659" id=cilium-http-egress ... msg="Datapath proxy redirection cannot be enabled, L7 proxy may be bypassed" error="ackProxyPort: zero port on cilium-http-egress not allowed" id="2424:egress:TCP:4096:" l7parser=http ... This change makes it necessary to manage the acknowledgment status of a proxy port separately from the 'nRedirects' field. Add a new 'acknowledged' field for this purpose. Fixes: #36142 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
os.Mkdir(bootstrapDir, 0777) | ||
|
||
// Make sure sockets dir exists | ||
os.Mkdir(GetSocketDir(config.runDir), 0777) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrajahalme should this use defaults.RuntimePathRights
instead (0775
, ie avoid write access for every user on the node, given that this code runs in production environments)?
I briefly looked at proposing the change myself, but the commit seems to suggest that some magic is necessary in order to run the corresponding tests. Not sure how I would validate that this doesn't immediately break the test again.
Endpoint regeneration can fail for many reasons, only one of which is Envoy NACKing the listener configuration. Previously it was possible for a proxy port (e.g., DNS proxy port) to be released (and subsequently reallocated) if endpoint regeneration adding a new redirect (such as endpoint restore on agent restart) happened to fail for any reason. Handle proxy port ACK/NACK directly with an AddListener callback and only reset the proxy port if there are no redirects on the port and a NACK was received. Since envoy.AddListener is not used for DNS proxy redirects, it follows that DNS proxy port is now never reset. The AddListener callback allows to keep even newly configured redirects if the regeneration failure reason is not a NACK from Envoy. This helps reduce churn on Envoy listener configuration and the associated datapath proxy port redirection when endpoint regeneration fails for other reasons than Envoy Listener NACK.
To make the AddListener callback work we had to add a separate mutex for the proxy ports, so that a proxy mutex can be held without holding the proxy ports mutex while calling AddListener with a callback that then needs to take the proxy ports mutex. This also happens to solve a latent deadlock bug in CEC parser that also needs to take the mutex for acknowledging a proxy port. In this case we had different locking order for the
Proxy.mutex
andxdsServer.mutex
in two different code paths:upsertListener()
xdsServer.mutex
Proxy.mutex
in the callback viaAckProxyPort()
Proxy.mutex
to create a new redirectAddListener()
which locksxdsServer.mutex
With the new
Proxy.proxyPorts.mutex
the above is modified so thatAckProxyPort()
no longer needs to takeProxy.mutex
(butProxy.proxyPorts.mutex
instead), and on the latter pathAddListener()
is called without holding the newProxy.proxyPorts.mutex
which can then be taken in the callback function as needed.Fixes: #36063