Skip to content

Conversation

joamaki
Copy link
Contributor

@joamaki joamaki commented Mar 31, 2023

This PR adds the following new tables to pkg/datapath/tables:

// Device is a local network device along with addresses associated with it.
type Device struct {
	Index        int              // positive integer that starts at one, zero is never used
	MTU          int              // maximum transmission unit
	Name         string           // e.g., "en0", "lo0", "eth0.100"
	HardwareAddr net.HardwareAddr // IEEE MAC-48, EUI-48 and EUI-64 form
	Flags        net.Flags        // e.g. net.FlagUp, net.eFlagLoopback, net.FlagMulticast

	Selected    bool            // If true this device can be used by Cilium
	Addrs       []DeviceAddress // Addresses assigned to the device
	RawFlags    uint32          // Raw interface flags
	Type        string          // Device type, e.g. "veth" etc.
	MasterIndex int             // Index of the master device (e.g. bridge or bonding device)
}

var deviceTableSchema = &memdb.TableSchema{
	Name: "devices",
	Indexes: map[string]*memdb.IndexSchema{
		string(statedb.IDIndex): {
			Name:         string(statedb.IDIndex),
			AllowMissing: false,
			Unique:       true,
			Indexer:      &memdb.IntFieldIndex{Field: "Index"},
		},
		string(DeviceNameIndex): {
			Name: string(DeviceNameIndex),
			// Name can be temporarily missing if we create the device
			// from an address update.
			AllowMissing: true,
			Unique:       true,
			Indexer:      &memdb.StringFieldIndex{Field: "Name"},
		},
	},
}

type Route struct {
	Table     int
	LinkIndex int

	Scope uint8
	Dst   netip.Prefix
	Src   netip.Addr
	Gw    netip.Addr
}

var (
	routeTableSchema = &memdb.TableSchema{
		Name: "routes",
		Indexes: map[string]*memdb.IndexSchema{
			"id": {
				Name:         "id",
				AllowMissing: false,
				Unique:       true,
				Indexer: &memdb.CompoundIndex{
					Indexes: []memdb.Indexer{
						&memdb.IntFieldIndex{Field: "Table"},
						&memdb.IntFieldIndex{Field: "LinkIndex"},
						&statedb.NetIPPrefixFieldIndex{Field: "Dst"},
					},
				},
			},
			string(linkIndexIndex): {
				Name:         string(linkIndexIndex),
				AllowMissing: false,
				Unique:       false,
				Indexer:      &memdb.IntFieldIndex{Field: "LinkIndex"},
			}},
	}
)

The tables are populated by the DevicesController by subscribing to links, addresses and routes using netlink.
To avoid large changes to existing code, this PR keeps the existing DeviceManager API and validates the
new implementation using the old existing test suite.

Follow-up PRs will incrementally switch uses of option.Config.GetDevices() to querying and watching the devices table.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 31, 2023
@joamaki joamaki force-pushed the pr/joamaki/statedb-devices branch 5 times, most recently from d827e6e to a724660 Compare April 17, 2023 12:22

// NodePortAddr returns the NodePort frontend addresses chosen
// from this device.
func (dev *Device) NodePortAddr() NodePortAddr {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need this function to be NodePort-specific? I guess in the future we will support multiple NodePort IP addrs per netdev, so maybe we could use a generic IPAddrs() method instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To elaborate more - I don't want to leak any LB details (in this case NodePort) into this relatively generic package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this method is about picking the single nodeport frontend for the device I would keep it NodePort-specific. However, you're right it shouldn't be here. I'll move it to a more appropriate place. In the future this would go away completely once service load-balancing is refactored to support multiple frontends per device.

Namespace: dc.NetNS,
ListExisting: false,
}
if err := netlink.AddrSubscribeWithOptions(addrUpdates, dc.ctx.Done(), opts); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe in the past we've seen issues where the netlink can be unreliabe, should we try to reestablish the nl connection/sub if the channel gets closed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's a bummer. Would require fair bit of additional complexity to resubscribe as it needs to relist from scratch in order to make sure no update has been missed.

Do you have any more info about the unreliability?

Copy link
Contributor Author

@joamaki joamaki Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

netlink man page has this to say:

       Netlink is not a reliable protocol.  It tries its best to deliver
       a message to its destination(s), but may drop messages when an
       out-of-memory condition or other error occurs.  For reliable
       transfer the sender can request an acknowledgement from the
       receiver by setting the NLM_F_ACK flag.  An acknowledgement is an
       NLMSG_ERROR packet with the error field set to 0.  The
       application must generate acknowledgements for received messages
       itself.  The kernel tries to send an NLMSG_ERROR message for
       every failed packet.  A user process should follow this
       convention too.

       However, reliable transmissions from kernel to user are
       impossible in any case.  The kernel can't send a netlink message
       if the socket buffer is full: the message will be dropped and the
       kernel and the user-space process will no longer have the same
       view of kernel state.  It is up to the application to detect when
       this happens (via the ENOBUFS error returned by [recvmsg(2)](https://man7.org/linux/man-pages/man2/recvmsg.2.html)) and
       resynchronize.

The go netlink library doesn't do the ACKs so this is indeed is pretty unreliable and may miss updates.
Not sure in what scenarios the netlink socket would get closed, but wouldn't be surprised it can happen
in OOM situations.

I'll need to think a bit how we can make this reliable...

EDIT: Could maybe check sequence numbers of all updates and then restart everything from scratch if there's a missing update or any other error. Though maybe ENOBUFS would be sent to the ErrorCallback when that would happen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, correct.

A timeout was added by a contributor to the nl library to solve issues where the socket would hang indefinitely (I don't think it was clear what the actual underlying issue was, potentially just a bug in netlink?).

Could we propose/add the feature ourselves?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recvfrom hang has been fixed 1 year ago: vishvananda/netlink#793. Unfortunately the netlink library seems to not be maintained well anymore. I'll likely look into actively maintained alternative libraries that would also allow cleanly distinguishing between initial listing and updates.

@joamaki joamaki force-pushed the pr/joamaki/statedb-devices branch 2 times, most recently from 93bbfe8 to 55d4285 Compare June 8, 2023 10:33
@aojea
Copy link
Contributor

aojea commented Jun 14, 2023

/cc

@joamaki joamaki force-pushed the pr/joamaki/statedb-devices branch from 55d4285 to f65e6e4 Compare June 14, 2023 09:49
@dylandreimerink dylandreimerink force-pushed the pr/joamaki/statedb-devices branch 2 times, most recently from ac666b4 to 0519596 Compare June 27, 2023 09:58
@joamaki joamaki force-pushed the pr/joamaki/statedb-devices branch 3 times, most recently from 2320a59 to f7bc6c5 Compare July 3, 2023 13:52
@joamaki joamaki added the release-note/misc This PR makes changes that have no direct user impact. label Jul 3, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jul 3, 2023
@joamaki joamaki force-pushed the pr/joamaki/statedb-devices branch from f7bc6c5 to 34ce5f2 Compare July 3, 2023 14:16
@joamaki joamaki marked this pull request as ready for review July 4, 2023 08:16
@joamaki joamaki requested a review from a team as a code owner July 4, 2023 08:16
@joamaki joamaki requested a review from lmb July 7, 2023 13:56
@joamaki joamaki force-pushed the pr/joamaki/statedb-devices branch 2 times, most recently from 5615c3f to dcc72ca Compare July 10, 2023 12:31
Copy link
Contributor

@lmb lmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your changes, it reads easier now.

}

for _, u := range updates {
if d == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary even though there is an if d == nil at the end of the loop?

Copy link
Contributor Author

@joamaki joamaki Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point is to stop processing further updates for this link. I could instead add a break label and do break updateLoop on line 438, but this seems nicer.

EDIT: Ah, but now I'm likely making unfounded assumption here about ifindex not being reused too quickly. Will figure out what I can actually assume and fix the code accordingly. Wonder what fun races this thing has with regards to addresses and routes. Likely the correct way to do this (which is not easy with this netlink library) is to have a single netlink socket over which we get address, link and route updates.

EDIT2: At least locally I'm not seeing any reuse of ifindex. Not quite sure how paranoid this code should be. Thoughts?


// netlinkFuncs wraps the netlink subscribe functions into a simpler interface to facilitate
// testing of the error handling paths.
type netlinkFuncs struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not saying you need to change this, but I'm curious: why a struct of func pointers instead of an interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less boilerplate and makes it easy to override/wrap one of the methods to inject slightly different behavior in tests.

case <-ctx.Done():
stop()
return
case <-t.After(restartWaitDuration):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing there's no need for a backoff here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the backoff in the end as it doesn't really bring much value, plus the logic for resetting the backoff would've been a bit involved (when would we even reset it?).

@brb brb removed their request for review July 21, 2023 08:04
var devices []string
if d.deviceManager != nil {
if _, err := d.deviceManager.Detect(params.Clientset.IsEnabled()); err != nil {
if option.Config.AreDevicesRequired() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be call if in the future each Cell could individually set a bit saying that the device detection is required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would get rid of this AreDevicesRequired stuff completely in the future and have each feature decide how to function or not without any devices.

@joamaki joamaki force-pushed the pr/joamaki/statedb-devices branch from dcc72ca to 975a93b Compare July 24, 2023 07:51
@joamaki
Copy link
Contributor Author

joamaki commented Jul 24, 2023

/test

@joamaki joamaki force-pushed the pr/joamaki/statedb-devices branch from 975a93b to 35aa8a2 Compare July 26, 2023 08:57
@joamaki
Copy link
Contributor Author

joamaki commented Jul 26, 2023

/test

@joamaki joamaki force-pushed the pr/joamaki/statedb-devices branch from 35aa8a2 to 2889f20 Compare July 27, 2023 08:04
@christarazi christarazi added area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. kind/feature This introduces new functionality. area/daemon Impacts operation of the Cilium daemon. area/modularization Relates to code modularization and maintenance. labels Jul 28, 2023
joamaki added 3 commits July 31, 2023 11:31
For consistency rename IPIndexer to IPFieldIndex.
Add NetIPPrefixFieldIndex to allow indexing netip.Prefix.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
Add state tables for storing network devices (along with addresses)
and routes.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
DevicesController populates the devices and routes tables.

To avoid large refactoring of the existing code around option.Config.GetDevices()
this opts for an incremental approach: keep the DeviceManager API as-is and reuse
its test to validate that the existing functionality does not break.

The tests required some slight adaptation: the "DeviceManager" is re-created after
each test case to recreate the initial devices table.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
@joamaki joamaki force-pushed the pr/joamaki/statedb-devices branch from 2889f20 to 57014fe Compare July 31, 2023 08:32
@joamaki
Copy link
Contributor Author

joamaki commented Aug 1, 2023

/test

@joamaki joamaki requested a review from tommyp1ckles August 1, 2023 08:12
Copy link
Contributor

@tommyp1ckles tommyp1ckles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok looks good on my end, thanks for bearing with me 😄

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Aug 2, 2023
@joamaki joamaki merged commit 03ad61b into cilium:main Aug 2, 2023
@sayboras sayboras mentioned this pull request Oct 6, 2023
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon Impacts operation of the Cilium daemon. area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/modularization Relates to code modularization and maintenance. kind/feature This introduces new functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants