Skip to content

Conversation

christarazi
Copy link
Member

Previously, regenerating an endpoint manually via CLI resulted in
compilation errors as seen below. These errors occurred because the
endpoint headerfile (ep_config.h or lxc_config.h) was not written to the
<endpoint ID>_next directory (will be reffered to as "next"), where
the BPF compilation takes place.

The reason the headerfile was not written to the "next" directory was
because Cilium only wrote the headerfile if it was changed (via hash).
However, a manual regeneration triggered through the API, sets the
regeneration level to "compile+load" (a full regeneration), where Cilium
expects all relevant files, including the headerfile, to be present in
the "next" directory (<endpoint ID>_next). Hence, the compilation
errors.

This commit fixes this issue by checking whether there has been a
request for a full regeneration, in which case, we write the headerfile
to the "next" directory.

root@k8s2:/var/run/cilium/state# clang -emit-llvm -O2 -target bpf -std=gnu89 -nostdinc -D__NR_CPUS__=2 -Wall -Wextra -Werror -Wshadow -Wno-address-of-packed-member -Wno-unknown-warning-option -Wno-gnu-variable-s
ized-type-not-at-end -Wdeclaration-after-statement -I/var/run/cilium/state/globals -I1285_next -I/var/lib/cilium/bpf -I/var/lib/cilium/bpf/include -c /var/lib/cilium/bpf/bpf_lxc.c -o /tmp/c
In file included from /var/lib/cilium/bpf/bpf_lxc.c:22:
/var/lib/cilium/bpf/lib/icmp6.h:50:29: error: use of undeclared identifier 'NODE_MAC'
        union macaddr smac, dmac = NODE_MAC;
                                   ^
/var/lib/cilium/bpf/lib/icmp6.h:359:30: error: use of undeclared identifier 'NODE_MAC'
                union macaddr router_mac = NODE_MAC;
                                           ^
In file included from /var/lib/cilium/bpf/bpf_lxc.c:26:
/var/lib/cilium/bpf/lib/lxc.h:67:29: error: use of undeclared identifier 'NODE_MAC'
        union macaddr router_mac = NODE_MAC;
                                   ^
/var/lib/cilium/bpf/bpf_lxc.c:159:10: error: implicit declaration of function 'lookup_ip6_remote_endpoint' [-Werror,-Wimplicit-function-declaration]
                info = lookup_ip6_remote_endpoint(&orig_dip);
                       ^
/var/lib/cilium/bpf/bpf_lxc.c:159:10: note: did you mean 'lookup_ip6_endpoint'?
/var/lib/cilium/bpf/lib/eps.h:13:1: note: 'lookup_ip6_endpoint' declared here
lookup_ip6_endpoint(struct ipv6hdr *ip6)
^
/var/lib/cilium/bpf/bpf_lxc.c:159:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion]
                info = lookup_ip6_remote_endpoint(&orig_dip);
                     ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/var/lib/cilium/bpf/bpf_lxc.c:529:10: error: implicit declaration of function 'lookup_ip4_remote_endpoint' [-Werror,-Wimplicit-function-declaration]
                info = lookup_ip4_remote_endpoint(orig_dip);
                       ^
/var/lib/cilium/bpf/bpf_lxc.c:529:10: note: did you mean 'lookup_ip4_endpoint'?
/var/lib/cilium/bpf/lib/eps.h:35:1: note: 'lookup_ip4_endpoint' declared here
lookup_ip4_endpoint(const struct iphdr *ip4)
^
/var/lib/cilium/bpf/bpf_lxc.c:529:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion]
                info = lookup_ip4_remote_endpoint(orig_dip);
                     ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/var/lib/cilium/bpf/bpf_lxc.c:1027:10: error: implicit declaration of function 'lookup_ip6_remote_endpoint' [-Werror,-Wimplicit-function-declaration]
                info = lookup_ip6_remote_endpoint(src);
                       ^
/var/lib/cilium/bpf/bpf_lxc.c:1027:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion]
                info = lookup_ip6_remote_endpoint(src);
                     ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/var/lib/cilium/bpf/bpf_lxc.c:1256:10: error: implicit declaration of function 'lookup_ip4_remote_endpoint' [-Werror,-Wimplicit-function-declaration]
                info = lookup_ip4_remote_endpoint(ip4->saddr);
                       ^
/var/lib/cilium/bpf/bpf_lxc.c:1256:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion]
                info = lookup_ip4_remote_endpoint(ip4->saddr);
                     ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
11 errors generated.

Fixes #10630
Fixes #12005

Fix manual endpoint regeneration via command line

@christarazi christarazi requested a review from a team as a code owner July 14, 2020 20:59
@christarazi christarazi added area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. area/endpoint labels Jul 14, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Jul 14, 2020
@christarazi christarazi added area/loader Impacts the loading of BPF programs into the kernel. kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. and removed area/endpoint labels Jul 14, 2020
@christarazi
Copy link
Member Author

christarazi commented Jul 14, 2020

I ran into this bug while debugging #12482 and looked into it. I'm not 100% sure on the implementation of this, but it does seem reasonable to me. I'd like to get feedback from those who have more context on the regeneration of endpoints.

@joestringer The code touches recently changed parts by you. It'd be great to have your feedback.

@christarazi
Copy link
Member Author

test-me-please

@coveralls
Copy link

coveralls commented Jul 14, 2020

Coverage Status

Coverage decreased (-0.05%) to 36.954% when pulling d138143 on christarazi:pr/christarazi/fix-endpoint-manual-regenerate into b0610a4 on cilium:master.

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit, I think we can have this be a little more concise but the change LGTM.

@@ -870,6 +870,11 @@ func (e *Endpoint) runPreCompilationSteps(regenContext *regenerationContext) (he
if err = e.writeHeaderfile(nextDir); err != nil {
return false, fmt.Errorf("unable to write header file: %s", err)
}
} else if datapathRegenCtxt.regenerationLevel == regeneration.RegenerateWithDatapathRebuild {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably tweak this to avoid duplicating the writeHeaderfile() call by instead having the headerfileChanged check above only set the regenerationLevel then have this check do the following instead:

Suggested change
} else if datapathRegenCtxt.regenerationLevel == regeneration.RegenerateWithDatapathRebuild {
if datapathRegenCtxt.regenerationLevel >= regeneration.RegenerateWithDatapathRewrite {

Then I'm not sure whether the comment below provides any additional context; basically if either a rewrite or rebuild is required, we'll be sure to write the headerfile again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@joestringer
Copy link
Member

May be worth considering whether we should backport this. In an ideal circumstance we should never need this, but even for development purposes for backports and so on it may turn out to be useful. I note that the referred PR that this marks as "Fixes" was also attempting to fix a broader version of this bug and was backported to v1.6; hopefully this will bring the saga to a close :-)

@christarazi
Copy link
Member Author

christarazi commented Jul 14, 2020

May be worth considering whether we should backport this. In an ideal circumstance we should never need this, but even for development purposes for backports and so on it may turn out to be useful. I note that the referred PR that this marks as "Fixes" was also attempting to fix a broader version of this bug and was backported to v1.6; hopefully this will bring the saga to a close :-)

I was thinking that as well. It's not a crucial bug, but the backports should be trivial as the change is small and code paths are consistent in all relevant trees. Marking for 1.{6,7,8}.

Previously, regenerating an endpoint manually via CLI resulted in
compilation errors as seen below. These errors occurred because the
endpoint headerfile (ep_config.h or lxc_config.h) was not written to the
`<endpoint ID>_next` directory (will be reffered to as "next"), where
the BPF compilation takes place.

The reason the headerfile was not written to the "next" directory was
because Cilium only wrote the headerfile if it was changed (via hash).
However, a manual regeneration triggered through the API, sets the
regeneration level to "compile+load" (a full regeneration), where Cilium
expects all relevant files, including the headerfile, to be present in
the "next" directory (`<endpoint ID>_next`). Hence, the compilation
errors.

This commit fixes this issue by checking whether there has been a
request for a full regeneration, in which case, we write the headerfile
to the "next" directory.

```
root@k8s2:/var/run/cilium/state# clang -emit-llvm -O2 -target bpf -std=gnu89 -nostdinc -D__NR_CPUS__=2 -Wall -Wextra -Werror -Wshadow -Wno-address-of-packed-member -Wno-unknown-warning-option -Wno-gnu-variable-s
ized-type-not-at-end -Wdeclaration-after-statement -I/var/run/cilium/state/globals -I1285_next -I/var/lib/cilium/bpf -I/var/lib/cilium/bpf/include -c /var/lib/cilium/bpf/bpf_lxc.c -o /tmp/c
In file included from /var/lib/cilium/bpf/bpf_lxc.c:22:
/var/lib/cilium/bpf/lib/icmp6.h:50:29: error: use of undeclared identifier 'NODE_MAC'
        union macaddr smac, dmac = NODE_MAC;
                                   ^
/var/lib/cilium/bpf/lib/icmp6.h:359:30: error: use of undeclared identifier 'NODE_MAC'
                union macaddr router_mac = NODE_MAC;
                                           ^
In file included from /var/lib/cilium/bpf/bpf_lxc.c:26:
/var/lib/cilium/bpf/lib/lxc.h:67:29: error: use of undeclared identifier 'NODE_MAC'
        union macaddr router_mac = NODE_MAC;
                                   ^
/var/lib/cilium/bpf/bpf_lxc.c:159:10: error: implicit declaration of function 'lookup_ip6_remote_endpoint' [-Werror,-Wimplicit-function-declaration]
                info = lookup_ip6_remote_endpoint(&orig_dip);
                       ^
/var/lib/cilium/bpf/bpf_lxc.c:159:10: note: did you mean 'lookup_ip6_endpoint'?
/var/lib/cilium/bpf/lib/eps.h:13:1: note: 'lookup_ip6_endpoint' declared here
lookup_ip6_endpoint(struct ipv6hdr *ip6)
^
/var/lib/cilium/bpf/bpf_lxc.c:159:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion]
                info = lookup_ip6_remote_endpoint(&orig_dip);
                     ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/var/lib/cilium/bpf/bpf_lxc.c:529:10: error: implicit declaration of function 'lookup_ip4_remote_endpoint' [-Werror,-Wimplicit-function-declaration]
                info = lookup_ip4_remote_endpoint(orig_dip);
                       ^
/var/lib/cilium/bpf/bpf_lxc.c:529:10: note: did you mean 'lookup_ip4_endpoint'?
/var/lib/cilium/bpf/lib/eps.h:35:1: note: 'lookup_ip4_endpoint' declared here
lookup_ip4_endpoint(const struct iphdr *ip4)
^
/var/lib/cilium/bpf/bpf_lxc.c:529:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion]
                info = lookup_ip4_remote_endpoint(orig_dip);
                     ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/var/lib/cilium/bpf/bpf_lxc.c:1027:10: error: implicit declaration of function 'lookup_ip6_remote_endpoint' [-Werror,-Wimplicit-function-declaration]
                info = lookup_ip6_remote_endpoint(src);
                       ^
/var/lib/cilium/bpf/bpf_lxc.c:1027:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion]
                info = lookup_ip6_remote_endpoint(src);
                     ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/var/lib/cilium/bpf/bpf_lxc.c:1256:10: error: implicit declaration of function 'lookup_ip4_remote_endpoint' [-Werror,-Wimplicit-function-declaration]
                info = lookup_ip4_remote_endpoint(ip4->saddr);
                       ^
/var/lib/cilium/bpf/bpf_lxc.c:1256:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion]
                info = lookup_ip4_remote_endpoint(ip4->saddr);
                     ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
11 errors generated.
```

Fixes cilium#10630
Fixes cilium#12005

Signed-off-by: Chris Tarazi <chris@isovalent.com>
@christarazi christarazi force-pushed the pr/christarazi/fix-endpoint-manual-regenerate branch from d5d54e3 to d138143 Compare July 15, 2020 00:25
@pchaigno
Copy link
Member

@christarazi Did you check if this fix also works for the host endpoint? I can have a look as a follow up if not.

@pchaigno
Copy link
Member

test-me-please

Copy link
Member

@borkmann borkmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm as far as I can see

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/loader Impacts the loading of BPF programs into the kernel. kind/bug This is a bug in the Cilium logic. kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cilium endpoint regenerate broken
9 participants