-
Notifications
You must be signed in to change notification settings - Fork 3.4k
datapath: Devices table and controller #24677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d827e6e
to
a724660
Compare
pkg/datapath/state/device.go
Outdated
|
||
// NodePortAddr returns the NodePort frontend addresses chosen | ||
// from this device. | ||
func (dev *Device) NodePortAddr() NodePortAddr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need this function to be NodePort-specific? I guess in the future we will support multiple NodePort IP addrs per netdev, so maybe we could use a generic IPAddrs()
method instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To elaborate more - I don't want to leak any LB details (in this case NodePort) into this relatively generic package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this method is about picking the single nodeport frontend for the device I would keep it NodePort-specific. However, you're right it shouldn't be here. I'll move it to a more appropriate place. In the future this would go away completely once service load-balancing is refactored to support multiple frontends per device.
Namespace: dc.NetNS, | ||
ListExisting: false, | ||
} | ||
if err := netlink.AddrSubscribeWithOptions(addrUpdates, dc.ctx.Done(), opts); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe in the past we've seen issues where the netlink can be unreliabe, should we try to reestablish the nl connection/sub if the channel gets closed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's a bummer. Would require fair bit of additional complexity to resubscribe as it needs to relist from scratch in order to make sure no update has been missed.
Do you have any more info about the unreliability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
netlink man page has this to say:
Netlink is not a reliable protocol. It tries its best to deliver
a message to its destination(s), but may drop messages when an
out-of-memory condition or other error occurs. For reliable
transfer the sender can request an acknowledgement from the
receiver by setting the NLM_F_ACK flag. An acknowledgement is an
NLMSG_ERROR packet with the error field set to 0. The
application must generate acknowledgements for received messages
itself. The kernel tries to send an NLMSG_ERROR message for
every failed packet. A user process should follow this
convention too.
However, reliable transmissions from kernel to user are
impossible in any case. The kernel can't send a netlink message
if the socket buffer is full: the message will be dropped and the
kernel and the user-space process will no longer have the same
view of kernel state. It is up to the application to detect when
this happens (via the ENOBUFS error returned by [recvmsg(2)](https://man7.org/linux/man-pages/man2/recvmsg.2.html)) and
resynchronize.
The go netlink
library doesn't do the ACKs so this is indeed is pretty unreliable and may miss updates.
Not sure in what scenarios the netlink socket would get closed, but wouldn't be surprised it can happen
in OOM situations.
I'll need to think a bit how we can make this reliable...
EDIT: Could maybe check sequence numbers of all updates and then restart everything from scratch if there's a missing update or any other error. Though maybe ENOBUFS
would be sent to the ErrorCallback
when that would happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, correct.
A timeout was added by a contributor to the nl library to solve issues where the socket would hang indefinitely (I don't think it was clear what the actual underlying issue was, potentially just a bug in netlink?).
Could we propose/add the feature ourselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recvfrom hang has been fixed 1 year ago: vishvananda/netlink#793. Unfortunately the netlink library seems to not be maintained well anymore. I'll likely look into actively maintained alternative libraries that would also allow cleanly distinguishing between initial listing and updates.
93bbfe8
to
55d4285
Compare
/cc |
55d4285
to
f65e6e4
Compare
ac666b4
to
0519596
Compare
2320a59
to
f7bc6c5
Compare
f7bc6c5
to
34ce5f2
Compare
5615c3f
to
dcc72ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your changes, it reads easier now.
} | ||
|
||
for _, u := range updates { | ||
if d == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary even though there is an if d == nil
at the end of the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Point is to stop processing further updates for this link. I could instead add a break label and do break updateLoop
on line 438, but this seems nicer.
EDIT: Ah, but now I'm likely making unfounded assumption here about ifindex not being reused too quickly. Will figure out what I can actually assume and fix the code accordingly. Wonder what fun races this thing has with regards to addresses and routes. Likely the correct way to do this (which is not easy with this netlink library) is to have a single netlink socket over which we get address, link and route updates.
EDIT2: At least locally I'm not seeing any reuse of ifindex. Not quite sure how paranoid this code should be. Thoughts?
|
||
// netlinkFuncs wraps the netlink subscribe functions into a simpler interface to facilitate | ||
// testing of the error handling paths. | ||
type netlinkFuncs struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not saying you need to change this, but I'm curious: why a struct of func pointers instead of an interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Less boilerplate and makes it easy to override/wrap one of the methods to inject slightly different behavior in tests.
case <-ctx.Done(): | ||
stop() | ||
return | ||
case <-t.After(restartWaitDuration): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing there's no need for a backoff here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the backoff in the end as it doesn't really bring much value, plus the logic for resetting the backoff would've been a bit involved (when would we even reset it?).
var devices []string | ||
if d.deviceManager != nil { | ||
if _, err := d.deviceManager.Detect(params.Clientset.IsEnabled()); err != nil { | ||
if option.Config.AreDevicesRequired() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be call if in the future each Cell could individually set a bit saying that the device detection is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would get rid of this AreDevicesRequired
stuff completely in the future and have each feature decide how to function or not without any devices.
dcc72ca
to
975a93b
Compare
/test |
975a93b
to
35aa8a2
Compare
/test |
35aa8a2
to
2889f20
Compare
For consistency rename IPIndexer to IPFieldIndex. Add NetIPPrefixFieldIndex to allow indexing netip.Prefix. Signed-off-by: Jussi Maki <jussi@isovalent.com>
Add state tables for storing network devices (along with addresses) and routes. Signed-off-by: Jussi Maki <jussi@isovalent.com>
DevicesController populates the devices and routes tables. To avoid large refactoring of the existing code around option.Config.GetDevices() this opts for an incremental approach: keep the DeviceManager API as-is and reuse its test to validate that the existing functionality does not break. The tests required some slight adaptation: the "DeviceManager" is re-created after each test case to recreate the initial devices table. Signed-off-by: Jussi Maki <jussi@isovalent.com>
2889f20
to
57014fe
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok looks good on my end, thanks for bearing with me 😄
This PR adds the following new tables to pkg/datapath/tables:
The tables are populated by the DevicesController by subscribing to links, addresses and routes using netlink.
To avoid large changes to existing code, this PR keeps the existing DeviceManager API and validates the
new implementation using the old existing test suite.
Follow-up PRs will incrementally switch uses of
option.Config.GetDevices()
to querying and watching the devices table.