-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Add sessionAffinity #11085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sessionAffinity #11085
Conversation
Please set the appropriate release note label. |
1 similar comment
Please set the appropriate release note label. |
68f2914
to
0207f2d
Compare
0214b31
to
5a7d931
Compare
Commit 59645d9 does not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
5cb7ac9
to
182ffd6
Compare
Commit 59645d9 does not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
test-me-please |
3 similar comments
test-me-please |
test-me-please |
test-me-please |
182ffd6
to
ab7a053
Compare
Commit 59645d9 does not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
ab7a053
to
4a606cb
Compare
4a606cb
to
c3d059c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After addressing various minor misc items, looks good to me.
log.Warnf("sessionAffinity for host reachable services needs kernel 5.7.0 or newer. " + | ||
"Disabling sessionAffinity for cases when a service is accessed from a cluster.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this a fatal if running in strict mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The full feature support requires the unreleased kernel. So, if we fatal, then we will crash almost all kubeProxyReplacement=strict deployments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is inconsistent with the other strict modes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this would make kubeProxyReplacement=strict useless. We can make it strict once 5.7 is included by major distros.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few minor comments :-)
// update event which triggers a removal of the deleted dstPod from | ||
// the affinity and the service BPF maps. Therefore, the bellow requests | ||
// are flaky. | ||
// TODO(brb) don't sleep, instead wait for Endpoint obj update (might be complicated though) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use EndpointGet()
helper to wait the Cilium Endpoint is deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look too deeply as it looked like you have good coverage from codeowners other than docs, which is trivial (pending future changes to document it properly). Let me know if you want me to look at something in particular in more detail.
Couple of minor thoughts below.
@@ -57,7 +57,8 @@ LB_OPTIONS = \ | |||
-DENABLE_IPV4:-DENABLE_IPV6:-DLB_L3:-DLB_L4:-DENABLE_HOST_SERVICES_UDP:-DENABLE_HOST_SERVICES_TCP:-DENABLE_NODEPORT \ | |||
-DENABLE_IPV4:-DENABLE_IPV6:-DLB_L3:-DLB_L4:-DENABLE_HOST_SERVICES_UDP:-DENABLE_HOST_SERVICES_TCP:-DENABLE_NODEPORT:-DENABLE_EXTERNAL_IP \ | |||
-DENABLE_IPV4:-DENABLE_IPV6:-DENABLE_IPSEC:-DLB_L3:-DLB_L4:-DENABLE_HOST_SERVICES_UDP:-DENABLE_HOST_SERVICES_TCP:-DENABLE_NODEPORT:-DENABLE_EXTERNAL_IP \ | |||
-DENABLE_IPV4:-DENABLE_IPV6:-DENABLE_IPSEC:-DLB_L3:-DLB_L4:-DENABLE_HOST_SERVICES_UDP:-DENABLE_HOST_SERVICES_TCP:-DENABLE_NODEPORT:-DENABLE_EXTERNAL_IP:-DENABLE_NODEPORT_ACCELERATION | |||
-DENABLE_IPV4:-DENABLE_IPV6:-DENABLE_IPSEC:-DLB_L3:-DLB_L4:-DENABLE_HOST_SERVICES_UDP:-DENABLE_HOST_SERVICES_TCP:-DENABLE_NODEPORT:-DENABLE_EXTERNAL_IP:-DENABLE_NODEPORT_ACCELERATION \ | |||
-DENABLE_IPV4:-DENABLE_IPV6:-DENABLE_IPSEC:-DLB_L3:-DLB_L4:-DENABLE_HOST_SERVICES_UDP:-DENABLE_HOST_SERVICES_TCP:-DENABLE_NODEPORT:-DENABLE_EXTERNAL_IP:-DENABLE_NODEPORT_ACCELERATION:-DENABLE_SESSION_AFFINITY \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should ENABLE_SESSION_AFFINITY be part of the MAX_LB_OPTIONS
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I guess this plays into the v4.19 complexity stuff you were mentioning previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to prevent test/bpf/verifier-test.sh
from failing when ENABLE_SESSION_AFFINITY=1
. It fails, because it's running on the 4.9 kernel, and the session affinity BPF maps cannot be created, as the 4.9 does not support LRU maps.
@@ -0,0 +1,180 @@ | |||
// Copyright 2020 Authors of Cilium |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if lbmap
is worth folding into loadbalancer
codeowner or if it should stay in bpf
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, makes sense to add to loadbalancer
. Shall I do it in this PR?
Currently, the flag only defines ENABLE_SESSION_AFFINITY, and checks whether it can be enabled (>= 5.7 constraint). Signed-off-by: Martynas Pumputis <m@lambda.lt>
This commit introduces the BPF maps used by the sessionAffinity feature: - lb{4,6}_affinity: used to store current affinity sessions; key = svc id + client_id (either netns cookie or client ip addr), val = last_used and backend_id. - lb_affinity_match: used to check whether a backend still belongs to a svc; key = svc id + backend id; the map is going to be shared between ipv4 and ipv6, as svc and backend id pools are the same for ipv4 and ipv6. Signed-off-by: Martynas Pumputis <m@lambda.lt>
First, this commit extends the lb{4,6}_service struct by allowing to store the fact that sessionAffinity is enabled and a sessionAffinity timeout (in seconds). The latter is stored only in the placeholder svc value (slave=0), and it is stored in the backend_id field (thus the change to the union type). Second, it adds the provisioning of the sessionAffinity field from k8s to the BPF maps. Signed-off-by: Martynas Pumputis <m@lambda.lt>
This commit adds the sessionAffinity feature implementation to bpf_sock.c and lb.h. The basic principle is before selecting a svc backend, check whether there is an active session for the given client. If so, check whether backend still belongs to the svc. In the case of bpf_sock.c, the client is identified by a netns cookie, while in lb.h by the client IP addr (in the former the IP addr is not available). Signed-off-by: Martynas Pumputis <m@lambda.lt>
For new deployments it's going to be enabled, as the default in the helm template is true, and for old - disabled, as cilium-agent defaults to false. Signed-off-by: Martynas Pumputis <m@lambda.lt>
ee2f370
to
be4440c
Compare
test-me-please |
be4440c
to
f4c62d1
Compare
test-me-please |
Signed-off-by: Martynas Pumputis <m@lambda.lt>
@aanm @borkmann @christarazi @joestringer @jrajahalme Thanks for the review! I've pushed the changes which address your comments. Please check the unresolved comments. |
test-gke |
1 similar comment
test-gke |
if strict { | ||
log.Fatal(msg) | ||
} else { | ||
log.Warn(fmt.Sprintf("%s. Disabling the feature.", msg)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.Warnf("%s. Disabling the feature.", msg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do it in a follow-up.
This PR introduces the sessionAffinity feature for the kube-proxy replacement, and also fixes the sessionAffinity when running with kube-proxy (the latter validated manually with ClusterIP services).
Reviewable per commit.
Also, validated on 4.19.57 with ipv4+ipv6+debug=false+vxlan+kubeProxyReplacement=strict - had to disable the ipv4 frags feature to make the verifier happy (otherwise,
from-overlay
is 26 insn over the limit /cc @qmonnet ).Regarding running w/ kube-proxy, to make sessionAffinity to work, a user has to disable the host-reachable services feature when running on < 5.7 kernel.
Follow-up TODO (will move to a separate issue once this PR is merged):
Fix #9076
Fix #11042