-
-
Notifications
You must be signed in to change notification settings - Fork 69
policy: Do not store policy reference in Cilium socket option #1010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
07d4662
to
080d03a
Compare
080d03a
to
2962755
Compare
c6111e0
to
63d0b4a
Compare
Include also sni test not having "l7" in it's name, and check the logs for errors as well. Test with embedded cilium-envoy so that Envoy error and warning logs will also be checked for. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Add a runtime assert that the policy destruction happens from the main or test thread. The "test thread" case also covers the destruction from the initial thread, which happens for the static members of 'AllowAllEgressPolicyInstanceImpl' when running `cilium-envoy version', for example. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
63d0b4a
to
1810b93
Compare
1810b93
to
a94e8f2
Compare
a94e8f2
to
2824f9e
Compare
Remove the member 'initial_policy_' from the Cilium SocketOption class, as that reference was possibly kept after the policy had already been deleted. This made it possible for the policy to be destructed from the worker thread, which can lead to Envoy crash. 'initial_policy_' was stored as we already did the policy lookup and the same policy is needed in other Cilium filters. Have the cilium.network and cilium.tls_wrapper filters do their own policy lookups instead, so that we do not need to keep the reference in SocketOption. Worker threads do the policy lookup from their own thread local maps, which hold a reference to the policy. These thread local references are relinquished from post function during policy update, which will release worker thread's last reference to the policy at the time. The main thread is the last one to update or delete policy, so the policy instance destruction happens from the main thread. For this to work it is imperative to only hold policy references in these thread local maps. Note that even keeping a weak reference accessible to the worker threads outside of the policy maps is risky, as then the worker thread could convert that weak reference to a shared pointer while the reference has already been relinquished from the thread's local policy map. In this situation the other threads could also relinquish their references concurrently to the worker thread holding the shared pointer, which would then become the last reference and destruction would happen from the worker thread again. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
2824f9e
to
78b3394
Compare
sayboras
approved these changes
Nov 25, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have verified the stack trace before and after the changes, nice find on socket option 🎖️.
Two comments as per below, but not a blocker.
This was referenced Nov 26, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Remove the member
initial_policy_
from the CiliumSocketOption
class, as that reference was possibly kept after the policy had already been deleted. This made it possible for the policy to be destructed from the worker thread, which can lead to Envoy crash.initial_policy_
was stored as we already did the policy lookup and the same policy is needed in other Cilium filters. Have thecilium.network
andcilium.tls_wrapper
filters do their own policy lookups instead, so that we do not need to keep the reference inSocketOption
.Worker threads do the policy lookup from their own thread local maps, which hold a reference to the policy. These thread local references are relinquished from post function during policy update, which will release worker thread's last reference to the policy at the time. The main thread is the last one to update or delete policy, so the policy instance destruction happens from the main thread. For this to work it is imperative to only hold policy references in these thread local maps.
Note that even keeping a weak reference accessible to the worker threads outside of the policy maps is risky, as then the worker thread could convert that weak reference to a shared pointer while the reference has already been relinquished from the thread's local policy map. In this situation the other threads could also relinquish their references concurrently to the worker thread holding the shared pointer, which would then become the last reference and destruction would happen from the worker thread again.