Skip to content

Conversation

brb
Copy link
Member

@brb brb commented Apr 21, 2020

This PR introduces the sessionAffinity feature for the kube-proxy replacement, and also fixes the sessionAffinity when running with kube-proxy (the latter validated manually with ClusterIP services).

Reviewable per commit.

Also, validated on 4.19.57 with ipv4+ipv6+debug=false+vxlan+kubeProxyReplacement=strict - had to disable the ipv4 frags feature to make the verifier happy (otherwise, from-overlay is 26 insn over the limit /cc @qmonnet ).

Regarding running w/ kube-proxy, to make sessionAffinity to work, a user has to disable the host-reachable services feature when running on < 5.7 kernel.

Follow-up TODO (will move to a separate issue once this PR is merged):

  • Documentation (kube-proxy free gsg, the BPF map sizes, the upgrade guide)
  • IPv6 CI tests
  • Fix sleep in the CI tests
  • Add tests for when running w/ kube-proxy
  • Add sessionAffinity #11085 (comment)

Fix #9076
Fix #11042

Add support for services sessionAffinity (without and with kube-proxy)

@brb brb added the wip label Apr 21, 2020
@maintainer-s-little-helper
Copy link

Please set the appropriate release note label.

1 similar comment
@maintainer-s-little-helper
Copy link

Please set the appropriate release note label.

@coveralls
Copy link

coveralls commented Apr 21, 2020

Coverage Status

Coverage decreased (-0.01%) to 44.627% when pulling f4c62d1 on pr/brb/session-affinity into a4867b8 on master.

@brb brb force-pushed the pr/brb/session-affinity branch 9 times, most recently from 0214b31 to 5a7d931 Compare April 23, 2020 15:21
@maintainer-s-little-helper
Copy link

Commit 59645d9 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Apr 23, 2020
@brb brb force-pushed the pr/brb/session-affinity branch from 5cb7ac9 to 182ffd6 Compare April 23, 2020 19:41
@maintainer-s-little-helper
Copy link

Commit 59645d9 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@brb
Copy link
Member Author

brb commented Apr 23, 2020

test-me-please

3 similar comments
@brb
Copy link
Member Author

brb commented Apr 24, 2020

test-me-please

@brb
Copy link
Member Author

brb commented Apr 24, 2020

test-me-please

@brb
Copy link
Member Author

brb commented Apr 24, 2020

test-me-please

@brb brb force-pushed the pr/brb/session-affinity branch from 182ffd6 to ab7a053 Compare April 24, 2020 07:54
@maintainer-s-little-helper
Copy link

Commit 59645d9 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@brb brb force-pushed the pr/brb/session-affinity branch from ab7a053 to 4a606cb Compare April 24, 2020 09:06
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Apr 24, 2020
@brb brb force-pushed the pr/brb/session-affinity branch from 4a606cb to c3d059c Compare April 24, 2020 09:14
@brb brb added area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/kube-proxy-free labels Apr 24, 2020
Copy link
Member

@borkmann borkmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After addressing various minor misc items, looks good to me.

Comment on lines +1675 to +1693
log.Warnf("sessionAffinity for host reachable services needs kernel 5.7.0 or newer. " +
"Disabling sessionAffinity for cases when a service is accessed from a cluster.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this a fatal if running in strict mode?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The full feature support requires the unreleased kernel. So, if we fatal, then we will crash almost all kubeProxyReplacement=strict deployments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inconsistent with the other strict modes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this would make kubeProxyReplacement=strict useless. We can make it strict once 5.7 is included by major distros.

Copy link
Member

@jrajahalme jrajahalme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor comments :-)

// update event which triggers a removal of the deleted dstPod from
// the affinity and the service BPF maps. Therefore, the bellow requests
// are flaky.
// TODO(brb) don't sleep, instead wait for Endpoint obj update (might be complicated though)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use EndpointGet() helper to wait the Cilium Endpoint is deleted?

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look too deeply as it looked like you have good coverage from codeowners other than docs, which is trivial (pending future changes to document it properly). Let me know if you want me to look at something in particular in more detail.

Couple of minor thoughts below.

@@ -57,7 +57,8 @@ LB_OPTIONS = \
-DENABLE_IPV4:-DENABLE_IPV6:-DLB_L3:-DLB_L4:-DENABLE_HOST_SERVICES_UDP:-DENABLE_HOST_SERVICES_TCP:-DENABLE_NODEPORT \
-DENABLE_IPV4:-DENABLE_IPV6:-DLB_L3:-DLB_L4:-DENABLE_HOST_SERVICES_UDP:-DENABLE_HOST_SERVICES_TCP:-DENABLE_NODEPORT:-DENABLE_EXTERNAL_IP \
-DENABLE_IPV4:-DENABLE_IPV6:-DENABLE_IPSEC:-DLB_L3:-DLB_L4:-DENABLE_HOST_SERVICES_UDP:-DENABLE_HOST_SERVICES_TCP:-DENABLE_NODEPORT:-DENABLE_EXTERNAL_IP \
-DENABLE_IPV4:-DENABLE_IPV6:-DENABLE_IPSEC:-DLB_L3:-DLB_L4:-DENABLE_HOST_SERVICES_UDP:-DENABLE_HOST_SERVICES_TCP:-DENABLE_NODEPORT:-DENABLE_EXTERNAL_IP:-DENABLE_NODEPORT_ACCELERATION
-DENABLE_IPV4:-DENABLE_IPV6:-DENABLE_IPSEC:-DLB_L3:-DLB_L4:-DENABLE_HOST_SERVICES_UDP:-DENABLE_HOST_SERVICES_TCP:-DENABLE_NODEPORT:-DENABLE_EXTERNAL_IP:-DENABLE_NODEPORT_ACCELERATION \
-DENABLE_IPV4:-DENABLE_IPV6:-DENABLE_IPSEC:-DLB_L3:-DLB_L4:-DENABLE_HOST_SERVICES_UDP:-DENABLE_HOST_SERVICES_TCP:-DENABLE_NODEPORT:-DENABLE_EXTERNAL_IP:-DENABLE_NODEPORT_ACCELERATION:-DENABLE_SESSION_AFFINITY \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should ENABLE_SESSION_AFFINITY be part of the MAX_LB_OPTIONS?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I guess this plays into the v4.19 complexity stuff you were mentioning previously.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to prevent test/bpf/verifier-test.sh from failing when ENABLE_SESSION_AFFINITY=1. It fails, because it's running on the 4.9 kernel, and the session affinity BPF maps cannot be created, as the 4.9 does not support LRU maps.

@@ -0,0 +1,180 @@
// Copyright 2020 Authors of Cilium
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if lbmap is worth folding into loadbalancer codeowner or if it should stay in bpf.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense to add to loadbalancer. Shall I do it in this PR?

brb added 5 commits April 28, 2020 17:26
Currently, the flag only defines ENABLE_SESSION_AFFINITY, and
checks whether it can be enabled (>= 5.7 constraint).

Signed-off-by: Martynas Pumputis <m@lambda.lt>
This commit introduces the BPF maps used by the sessionAffinity feature:

- lb{4,6}_affinity:     used to store current affinity sessions;
                        key = svc id + client_id (either netns cookie or
                        client ip addr), val = last_used and backend_id.

- lb_affinity_match:    used to check whether a backend still belongs
                        to a svc; key = svc id + backend id; the map
                        is going to be shared between ipv4 and ipv6, as
                        svc and backend id pools are the same for ipv4
                        and ipv6.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
First, this commit extends the lb{4,6}_service struct by allowing to
store the fact that sessionAffinity is enabled and a sessionAffinity
timeout (in seconds). The latter is stored only in the placeholder
svc value (slave=0), and it is stored in the backend_id field (thus
the change to the union type).

Second, it adds the provisioning of the sessionAffinity field from k8s
to the BPF maps.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
This commit adds the sessionAffinity feature implementation to
bpf_sock.c and lb.h.

The basic principle is before selecting a svc backend, check whether
there is an active session for the given client. If so, check whether
backend still belongs to the svc.

In the case of bpf_sock.c, the client is identified by a netns cookie,
while in lb.h by the client IP addr (in the former the IP addr is not
available).

Signed-off-by: Martynas Pumputis <m@lambda.lt>
For new deployments it's going to be enabled, as the default in the helm
template is true, and for old - disabled, as cilium-agent defaults to
false.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
@brb brb force-pushed the pr/brb/session-affinity branch from ee2f370 to be4440c Compare April 28, 2020 16:17
@brb
Copy link
Member Author

brb commented Apr 28, 2020

test-me-please

@brb brb force-pushed the pr/brb/session-affinity branch from be4440c to f4c62d1 Compare April 28, 2020 16:40
@brb
Copy link
Member Author

brb commented Apr 28, 2020

test-me-please

Signed-off-by: Martynas Pumputis <m@lambda.lt>
@brb
Copy link
Member Author

brb commented Apr 28, 2020

@aanm @borkmann @christarazi @joestringer @jrajahalme Thanks for the review! I've pushed the changes which address your comments. Please check the unresolved comments.

@brb
Copy link
Member Author

brb commented Apr 28, 2020

test-gke

1 similar comment
@brb
Copy link
Member Author

brb commented Apr 29, 2020

test-gke

@brb brb requested a review from aanm April 29, 2020 07:36
if strict {
log.Fatal(msg)
} else {
log.Warn(fmt.Sprintf("%s. Disabling the feature.", msg))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.Warnf("%s. Disabling the feature.", msg)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do it in a follow-up.

@aanm aanm added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 29, 2020
@qmonnet qmonnet merged commit dbdf127 into master Apr 29, 2020
@qmonnet qmonnet deleted the pr/brb/session-affinity branch April 29, 2020 08:58
@brb brb mentioned this pull request Apr 29, 2020
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/major This PR introduces major new functionality to Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kops e2e tests failing tests related to session affinity. Support Service sessionAffinity
8 participants