Skip to content

Conversation

tgraf
Copy link
Member

@tgraf tgraf commented Sep 19, 2016

No description provided.

@tgraf tgraf added kind/enhancement This would improve or streamline existing functionality. pending-review labels Sep 19, 2016
@@ -18,8 +18,15 @@ bpf_netdev.o:
bpf_overlay.o:
clang ${CLANG_FLAGS} -c bpf_overlay.c -o $@

LB_OPTIONS = \
\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to force a compilation run without additional options

@@ -273,6 +273,10 @@ func (d *Daemon) writeBPFHeader(lxcDir string, ep *types.Endpoint, geneveOpts []
fmt.Fprintf(fw, "#define CT_MAP6 %s\n", path.Base(common.BPFMapCT6+strconv.Itoa(int(ep.ID))))
fmt.Fprintf(fw, "#define CT_MAP4 %s\n", path.Base(common.BPFMapCT4+strconv.Itoa(int(ep.ID))))

// Always enable L4 and L3 load balancer for now
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a d.conf.LBMode flag, at least we could either define them all or none.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to link it together in a follow up commit as we add higher level integration.

Copy link
Contributor

@mchalla mchalla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looks very good. Have a few comments / questions inline. Thanks.

\
-DLB_L3 \
-DLB_L4 \
-DLB_L3 -DLB_L4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like its repeated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a Make list and the foreach iterates over each line below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah ok makes sense now.

bpf_lb.o:
clang ${CLANG_FLAGS} -c bpf_lb.c -o $@
$(foreach OPTS,$(LB_OPTIONS), \
clang ${OPTS} ${CLANG_FLAGS} -c bpf_lb.c -o $@;)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for LB_L3 / LB_L4 or something else ? what would the default usage be ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is to compile with each possible combination of configuration options. I want to replace the dummy header files with this to make sure we can compile each program with each possible combination of options that can be provided.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

uint16_t sport;
int i, ret;
struct portmap local_map[] = {
LXC_PORT_MAPPINGS
};

/* Ignore unknown L4 protocols */
if (unlikely(!csum_off))
if (nexthdr != IPPROTO_TCP && nexthdr != IPPROTO_UDP)
Copy link
Contributor

@mchalla mchalla Sep 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we don't support icmp for east west, but support it for north south. I guess it makes sense but its probably more of a test tool in both cases. I actually did not realize l4_checksum_offset did not support that either until I took a look at the function. No need to change anything and just an observation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the port mapping code. We used the offset function to determine TCP/UDP or something else. As we now need checksum offsets for ICMPv6 I've made the check for TCP/UDP in port mapping explicit to allow using the same offset function for all of them. ICMP for east west is support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.


/* derive reverse NAT index and zero it. */
rev_nat_index = ip6->daddr.s6_addr32[3] & 0xFFFF;
if (rev_nat_index) {
int csum_off, csum_flags = 0;
if ((skb->cb[CB_FLAGS] & CB_F_CLUSTER_DST) && rev_nat_index) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check for east-west traffic within same cluster ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will drop this patch as it was only needed to test LB to endpoints in containers with an address outside of our cluster.

@@ -138,15 +138,14 @@ struct drop_notify {
#define DROP_FRAG_NOSUPPORT -157
#define DROP_NO_SERVICE -158

#define CB_F_CLUSTER_DST 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be add a comment above this /* values for CB_FLAGS */

if (flags & REV_NAT_F_TUPLE_SADDR) {
ipv6_addr_copy(&old_saddr, &tuple->addr);
ipv6_addr_copy(&tuple->addr, &nat->address);
new_saddr = tuple->addr.addr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this logic work for csum_replace when we are getting the address from the tuple and not the skb ? Are they guaranteed to be same. also I do not see this being invoked for v6, so is this only for v4 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_saddr points to the address in both tuple and packet case. Both ipv6_store_saddr and csum_diff() use new_saddr as pointer for the new address.

This is also handled in both __lb4_rev_nat() and __lb6_rev_nat(). Can you check again what you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant was the logic would work if old_saddr is the same as what would be obtained via ipv6_load_saddr(skb, ETH_HLEN, &old_saddr) so that a later csum_l4_replace will work fine. Because the skb csum was originally computed using whatever address was in the skb ? And the replace is simply offsetting the diff and not recomputing the entire csum.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I understand what you mean. The tuple contains the same address as in the packet so the logic in the revnat is identical. It's simply a matter of avoiding a slow skb_load_bytes()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

Provide generic helper to extract L4 checksum field offset and necessary
flags to pass into l4_csum_replace().

Also fixes a bug in L4 port mapping where the checksum offset was
calculated incorrectly.

Fixes: f1c24eb ("L4 port mapping")
Acked-by: Andre Martins <andre@cilium.io>
Acked-by: Madhu Challa <madhu@cilium.io>
Signed-off-by: Thomas Graf <thomas@cilium.io>
Acked-by: Andre Martins <andre@cilium.io>
Acked-by: Madhu Challa <madhu@cilium.io>
Signed-off-by: Thomas Graf <thomas@cilium.io>
Avoids invoking unnnecssary skb_load_bytes().

Acked-by: Andre Martins <andre@cilium.io>
Acked-by: Madhu Challa <madhu@cilium.io>
Signed-off-by: Thomas Graf <thomas@cilium.io>
After an outgoing connection has been loadbalanced its reverse
NAT index is stored in the connection tracking table to be
looked up for reply and related packets.

Also makes the loadbalancer configurable via LB_L3 and LB_L4
defines to limit key extraction and logic to either L3, L4 or
both.

Acked-by: Andre Martins <andre@cilium.io>
Acked-by: Madhu Challa <madhu@cilium.io>
Signed-off-by: Thomas Graf <thomas@cilium.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement This would improve or streamline existing functionality.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants