Skip to content

Conversation

gwenya
Copy link
Contributor

@gwenya gwenya commented Apr 25, 2025

See #1978

This is still WIP, the PR at this time is mainly for running the CI. Feedback is very welcome though, especially on the non-trivial TODOs.

@gwenya
Copy link
Contributor Author

gwenya commented Apr 25, 2025

@stgraber we currently don't have any tests specifically for this package, if any of the functions are tested then only incidentally.
Unit tests don't really work since this interacts with the system and there's no point in mocking that.
I'm thinking maybe a test program that can run any of these functions via reflection, and then a test script that calls said program and uses iproute2 tools to check if it does things correctly.
And ideally I'd like to be able to run these tests locally without messing up my network setup, so in a VM maybe. I'm not sure if that fits into our current testing system somehow?
Some of these might not even be possible to test in the CI, for example vrf is a kernel module that may or may not be loaded.

@gwenya gwenya force-pushed the netlink branch 8 times, most recently from a27e9ff to 687b6a6 Compare April 25, 2025 16:04
@stgraber
Copy link
Member

Yeah, it's not the kind of code that's particularly directly testable, it makes more sense to have it be indirectly tested through or normal system tests and the daily test runs we have on Jenkins too.

@gwenya
Copy link
Contributor Author

gwenya commented Apr 25, 2025

Hmh can we generate coverage data from the existing tests to see if they hit all the code paths in this package?

@stgraber
Copy link
Member

Hmm, not particularly easily as we typically run our tests across a LOT of incusd instances, so can't centrally keep track of everything that was hit. And some of the more complex bits (SR-IOV and the like) are tested externally, some manually, some automatically through Jenkins.

@gwenya
Copy link
Contributor Author

gwenya commented Apr 25, 2025

Hmh I see
The tests already caught some bugs that I'm fixing now but without knowing if everything is tested it's hard having confidence in the end result

@stgraber
Copy link
Member

@gwenya yeah, the best we can do is land this one early in the release to maximize the time we have for all our various tests to catch issues.

@gwenya gwenya force-pushed the netlink branch 14 times, most recently from 623d642 to e2d12a6 Compare April 27, 2025 09:36
@stgraber stgraber force-pushed the netlink branch 2 times, most recently from 6a6fb6a to 94115dd Compare June 22, 2025 22:41
@stgraber
Copy link
Member

Remaining issue to track down:

2025-06-22T23:25:22.0304357Z time="2025-06-22T23:25:22Z" level=error msg="Failed to restore route" driver=bridge err="Failed to replace route {Ifindex: 3 Dst: 0.0.0.0/0 Src: <nil> Gw: 100.64.1.1 Flags: [] Table: 254 Realm: 0}: network is unreachable: Nexthop has invalid gateway" network=inc27523x project=default

That hits in the cluster tests. It's non-fatal as it's a non-fatal backup/restore of routes, but the failure is unexpected.

@gwenya
Copy link
Contributor Author

gwenya commented Jun 26, 2025

I added the scope to our route struct so we can add it back with the same scope, I think that should fix it

@stgraber
Copy link
Member

Sounds good. I'm planning on doing another quick review/rebase pass on this one and then merging it shortly after 6.14 is out. That will give us a full dev cycle to catch any extra regressions.

gwenya added 14 commits June 27, 2025 11:44
Signed-off-by: Gwendolyn <me@gwendolyn.dev>
Signed-off-by: Gwendolyn <me@gwendolyn.dev>
Signed-off-by: Gwendolyn <me@gwendolyn.dev>
Signed-off-by: Gwendolyn <me@gwendolyn.dev>
Signed-off-by: Gwendolyn <me@gwendolyn.dev>
Signed-off-by: Gwendolyn <me@gwendolyn.dev>
Signed-off-by: Gwendolyn <me@gwendolyn.dev>
Signed-off-by: Gwendolyn <me@gwendolyn.dev>
Signed-off-by: Gwendolyn <me@gwendolyn.dev>
Signed-off-by: Gwendolyn <me@gwendolyn.dev>
…g netlink ourselves

Signed-off-by: Gwendolyn <me@gwendolyn.dev>
Signed-off-by: Gwendolyn <me@gwendolyn.dev>
Signed-off-by: Gwendolyn <me@gwendolyn.dev>
@stgraber stgraber enabled auto-merge June 27, 2025 18:21
@stgraber stgraber merged commit 7f792e6 into lxc:main Jun 27, 2025
38 checks passed
@stgraber
Copy link
Member

@gwenya thanks for the great work!

@stgraber
Copy link
Member

@gwenya FYI, I'm seeing a couple of potentially related CI failures in Jenkins today.
One is a network I/O limit validation which suggests that the limits are no longer being applied for some reason.

Another is a macvlan related network test which is now failing to give us an IPv6 address.

I'll look at these two a bit closer this week. It shouldn't be too difficult to figure out what's going on and fix it and that's exactly why I wanted this merged right after a stable release so we can catch those weird edge cases ;)

@gwenya
Copy link
Contributor Author

gwenya commented Jun 28, 2025

@stgraber At a quick glance, I think I'm not setting the all multicast flag properly, could that be the cause for the IPv6 issue?

@stgraber
Copy link
Member

Quite likely. IPv6 actively relies on multicast for router advertisements.

We also got this is failure that showed up a few hours ago: https://jenkins.linuxcontainers.org/job/incus-test-usb/651/console

@gwenya
Copy link
Contributor Author

gwenya commented Jun 28, 2025

Ah yeah, we should probably ignore that error in Flush.

Should I push fixes for these things in here or in a new PR?

@stgraber
Copy link
Member

In a new PR since we've merged this one. Can still come out of the same branch though.

@gwenya gwenya mentioned this pull request Jun 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants