-
Notifications
You must be signed in to change notification settings - Fork 3.4k
feat(agent): Add route-based node IP discovery #40095
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
365f5b7
to
75084a6
Compare
75084a6
to
90d837c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Helm looks good to me! Thanks!
Head branch was pushed to by a user without write access
90d837c
to
79c559c
Compare
/test |
3c8bdd2
to
f71fcec
Compare
/test |
f71fcec
to
5ad0b13
Compare
/test |
@tsotne95 if you're referencing an issue that is more than a year old, especially if the issue is marked as "resolved", I would suggest filing a fresh issue for those failures. For instance you referred to #31040 . For that issue, even if the symptom is the same, we already applied a fix to the tree well over a year ago. Therefore if you're experiencing the problem, there may have been a regression, or it could be that something in this PR triggers the failure in ways that are not affecting the |
Some more context here: Over the past couple of weeks I've been trying to help various contributors experiencing inconsistent test results, and naturally for any individual contributor it feels like the tests are giving poor feedback. If we can find the shortest path to ignore those failures it will unblock the progress on the individual PR. But in several cases we have merged PRs that increase the background failure rate, which then subsequently impact other PRs. If we follow the same pattern, we will constantly face the same problems. I'm aiming to set a higher bar for communication around these issues so that we face them head on and make a conscious decision how to handle them. If the test is unreliable on |
@joestringer |
Given the background of test failure rate on this PR, I'd feel more confident if we ran the tests again. Note that some of the failures last time were caused by #40801 , and a fix was merged around five hours ago. Rebasing should eliminate that failure cause. |
On some cloud platforms like GCE, IPs for external Load Balancers are not assigned directly to a network interface. Instead, they are added to the node's `local` routing table to make the kernel "own" the IP. The existing node address detection logic, which only scanned device addresses, could not see these IPs, making them unavailable for features like NodePort. This commit enhances the node address detection logic to discover these IPs directly from the routing table, making this behavior the default. The node-address controller now inspects the `local` routing table for host-scoped routes. To ensure consistency, an address from a route is only considered if the underlying device is suitable for node addressing. This device filtering logic is consolidated in a new `shouldUseDeviceForNodeAddress` helper, which checks if a device is "up" and "selected", while making specific exceptions for "dummy" devices (to support use cases like `nodelocaldns`) and other whitelisted interfaces. Key implementation details: - The logic avoids creating duplicate addresses if an IP from a route is already known from a device. - The `NodePort` eligibility for these route-based IPs is determined by the existing `--nodeport-addresses` whitelist. - If the whitelist is not set, route-based IPs are not marked for NodePort by default. With this change, Cilium can now correctly use GCE Load Balancer IPs for NodePort services out-of-the-box. test: Add new unit test `TestNodeAddressFromRoute`. The new test validates the route-based discovery logic under several conditions, including when NodePort whitelists match or don't match the discovered IP, and ensures no duplicates are created. Fixes: cilium#24481 Signed-off-by: Tsotne Chakhvadze <tsotne@google.com>
5ad0b13
to
6df1af1
Compare
/test |
On some cloud platforms like GCE, IPs for external Load Balancers are not assigned directly to a network interface.
Instead, they are added to the node's
local
routing table to make the kernel "own" the IP. The existing nodeaddress detection logic, which only scanned device addresses, could not see these IPs, making them unavailable for
features like NodePort.
This commit enhances the node address detection logic to discover these IPs directly from the routing table,
making this behavior the default.
The node-address controller now inspects the
local
routing table for host-scoped routes. To ensure consistency,an address from a route is only considered if the underlying device is suitable for node addressing. This device
filtering logic is consolidated in a new
shouldUseDeviceForNodeAddress
helper, which checks if a device is "up"and "selected", while making specific exceptions for "dummy" devices (to support use cases like
nodelocaldns
)and other whitelisted interfaces.
Key implementation details:
NodePort
eligibility for these route-based IPs is determined by the existing--nodeport-addresses
whitelist.
With this change, Cilium can now correctly use GCE Load Balancer IPs for NodePort services out-of-the-box.
test: Add new unit test
TestNodeAddressFromRoute
.The new test validates the route-based discovery logic under several conditions, including when NodePort whitelists
match or don't match the discovered IP, and ensures no duplicates are created.