-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Support storing loadbalancer reverse nat state in conntrack #131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -18,8 +18,15 @@ bpf_netdev.o: | |||
bpf_overlay.o: | |||
clang ${CLANG_FLAGS} -c bpf_overlay.c -o $@ | |||
|
|||
LB_OPTIONS = \ | |||
\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed to force a compilation run without additional options
@@ -273,6 +273,10 @@ func (d *Daemon) writeBPFHeader(lxcDir string, ep *types.Endpoint, geneveOpts [] | |||
fmt.Fprintf(fw, "#define CT_MAP6 %s\n", path.Base(common.BPFMapCT6+strconv.Itoa(int(ep.ID)))) | |||
fmt.Fprintf(fw, "#define CT_MAP4 %s\n", path.Base(common.BPFMapCT4+strconv.Itoa(int(ep.ID)))) | |||
|
|||
// Always enable L4 and L3 load balancer for now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a d.conf.LBMode
flag, at least we could either define them all or none.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to link it together in a follow up commit as we add higher level integration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general looks very good. Have a few comments / questions inline. Thanks.
\ | ||
-DLB_L3 \ | ||
-DLB_L4 \ | ||
-DLB_L3 -DLB_L4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like its repeated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a Make list and the foreach iterates over each line below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aah ok makes sense now.
bpf_lb.o: | ||
clang ${CLANG_FLAGS} -c bpf_lb.c -o $@ | ||
$(foreach OPTS,$(LB_OPTIONS), \ | ||
clang ${OPTS} ${CLANG_FLAGS} -c bpf_lb.c -o $@;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this for LB_L3 / LB_L4 or something else ? what would the default usage be ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here is to compile with each possible combination of configuration options. I want to replace the dummy header files with this to make sure we can compile each program with each possible combination of options that can be provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
uint16_t sport; | ||
int i, ret; | ||
struct portmap local_map[] = { | ||
LXC_PORT_MAPPINGS | ||
}; | ||
|
||
/* Ignore unknown L4 protocols */ | ||
if (unlikely(!csum_off)) | ||
if (nexthdr != IPPROTO_TCP && nexthdr != IPPROTO_UDP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we don't support icmp for east west, but support it for north south. I guess it makes sense but its probably more of a test tool in both cases. I actually did not realize l4_checksum_offset did not support that either until I took a look at the function. No need to change anything and just an observation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in the port mapping code. We used the offset function to determine TCP/UDP or something else. As we now need checksum offsets for ICMPv6 I've made the check for TCP/UDP in port mapping explicit to allow using the same offset function for all of them. ICMP for east west is support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
|
||
/* derive reverse NAT index and zero it. */ | ||
rev_nat_index = ip6->daddr.s6_addr32[3] & 0xFFFF; | ||
if (rev_nat_index) { | ||
int csum_off, csum_flags = 0; | ||
if ((skb->cb[CB_FLAGS] & CB_F_CLUSTER_DST) && rev_nat_index) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check for east-west traffic within same cluster ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will drop this patch as it was only needed to test LB to endpoints in containers with an address outside of our cluster.
@@ -138,15 +138,14 @@ struct drop_notify { | |||
#define DROP_FRAG_NOSUPPORT -157 | |||
#define DROP_NO_SERVICE -158 | |||
|
|||
#define CB_F_CLUSTER_DST 1 | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be add a comment above this /* values for CB_FLAGS */
if (flags & REV_NAT_F_TUPLE_SADDR) { | ||
ipv6_addr_copy(&old_saddr, &tuple->addr); | ||
ipv6_addr_copy(&tuple->addr, &nat->address); | ||
new_saddr = tuple->addr.addr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this logic work for csum_replace when we are getting the address from the tuple and not the skb ? Are they guaranteed to be same. also I do not see this being invoked for v6, so is this only for v4 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new_saddr points to the address in both tuple and packet case. Both ipv6_store_saddr
and csum_diff()
use new_saddr as pointer for the new address.
This is also handled in both __lb4_rev_nat()
and __lb6_rev_nat()
. Can you check again what you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant was the logic would work if old_saddr is the same as what would be obtained via ipv6_load_saddr(skb, ETH_HLEN, &old_saddr) so that a later csum_l4_replace will work fine. Because the skb csum was originally computed using whatever address was in the skb ? And the replace is simply offsetting the diff and not recomputing the entire csum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I understand what you mean. The tuple contains the same address as in the packet so the logic in the revnat is identical. It's simply a matter of avoiding a slow skb_load_bytes()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
Provide generic helper to extract L4 checksum field offset and necessary flags to pass into l4_csum_replace(). Also fixes a bug in L4 port mapping where the checksum offset was calculated incorrectly. Fixes: f1c24eb ("L4 port mapping") Acked-by: Andre Martins <andre@cilium.io> Acked-by: Madhu Challa <madhu@cilium.io> Signed-off-by: Thomas Graf <thomas@cilium.io>
Acked-by: Andre Martins <andre@cilium.io> Acked-by: Madhu Challa <madhu@cilium.io> Signed-off-by: Thomas Graf <thomas@cilium.io>
Avoids invoking unnnecssary skb_load_bytes(). Acked-by: Andre Martins <andre@cilium.io> Acked-by: Madhu Challa <madhu@cilium.io> Signed-off-by: Thomas Graf <thomas@cilium.io>
After an outgoing connection has been loadbalanced its reverse NAT index is stored in the connection tracking table to be looked up for reply and related packets. Also makes the loadbalancer configurable via LB_L3 and LB_L4 defines to limit key extraction and logic to either L3, L4 or both. Acked-by: Andre Martins <andre@cilium.io> Acked-by: Madhu Challa <madhu@cilium.io> Signed-off-by: Thomas Graf <thomas@cilium.io>
No description provided.