-
Notifications
You must be signed in to change notification settings - Fork 3.4k
[DRAFT]: NodeAddress table and refactored NodeAddressing #28712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DRAFT]: NodeAddress table and refactored NodeAddressing #28712
Conversation
/test |
abb3c08
to
bf3cbee
Compare
The last commit that I just pushed now implements While the impact of this is pretty clear for Second thing to think about is whether this filtering should happen in
|
08b4451
to
7e2ad17
Compare
Ended up implementing the Redid I'll squash the changes down to couple commits once it's in a matured state. Right now there's lots of rewriting going on. |
c0ce75b
to
3c26ce6
Compare
3c26ce6
to
215fef7
Compare
Commit 2320fdd does not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
Commit 2320fdd does not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
1 similar comment
Commit 2320fdd does not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
0f26561
to
17fd876
Compare
Commit dcf18a8 does not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
17fd876
to
7722715
Compare
Commit dcf18a8 does not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
Add a helper to construct only RWTable[T] and TableMeta to allow creating a table in a way that enforces that the table is populated by the producer before it can be read: var Cell = cell.Module("example", "Example module", // Privately provides RWTable[*Foo]: statedb.NewPrivateRWTableCell[*Foo]("foos", FooIDIndex), cell.Provide(New), ) func New(lc hive.Lifecycle, t statedb.RWTable[*Foo]) (FooController, Table[*Foo]) { fooCtrl := ... lc.Append(fooCtrl) // fooCtrl now starts before anything that reads Table[*Foo] return fooCtl, t } Signed-off-by: Jussi Maki <jussi@isovalent.com>
To enforce that the device and route tables are populated before being read from, provide the Table[*Device] and Table[*Route[ from the DevicesController constructor. This makes sure that anything that depends on e.g. 'Table[*Device]' will be constructed after DevicesController and thus the DevicesController start hook will execute before it. Signed-off-by: Jussi Maki <jussi@isovalent.com>
As Linux does not send comprehensive set of route delete messages on device deletion, flush the routes on the link delete message. Since the link and route messages are coming over separate sockets they may come out of order, so keep track of what link indexes have been deleted and ignore route updates for deleted links. Linux does reuse the index after the 32 bits roll around, so remove the "dead link index" on link creation. Signed-off-by: Jussi Maki <jussi@isovalent.com>
Since subscribing to links/addresses/routes and then listing them is racy we might end up seeing the same address in the initial listing and as an update. Avoid adding the address twice by checking its existence first. Side note: we need the initial listing to be able to populate the tables before readers access them in order to keep the existing non-reconciling semantics. The netlink library we're using does not have a mechanism to inform that the initial listing is done in the subscriptions (even though netlink DONE message is sent for this), so instead DevicesController does subscribe + list which necessitates dealing with issues like this. Signed-off-by: Jussi Maki <jussi@isovalent.com>
In order to write the NodeAddressing's LocalAddresses() in terms of Table[*Device], support the --local-max-addr-scope option to filter out addresses by scope. By default only addresses with scope lower than LINK are used. Also filter out loopback addresses as was done by the earlier code. Signed-off-by: Jussi Maki <jussi@isovalent.com>
As a step towards removing the global variables in pkg/node that back GetNodePortIPv4Addrs and friends, implement NodeAddressing as queries against the devices table. The big functional change this brings is that now all IP addresses of selected external network devices are used for NodePort frontends. Signed-off-by: Jussi Maki <jussi@isovalent.com>
The code is no longer specific to Linux as it is implemented in terms of queries against the LocalNodeStore and the devices table. Signed-off-by: Jussi Maki <jussi@isovalent.com>
Now that NodeAddressing is implemented on top of Table[*Device], the GetNodePort*Addrs*() functions are no longer needed. Adapt HeaderfileWriter to use NodeAddressing and add in DirectRouting() accessor to it to look up the direct routing device ifindex and IP address. Signed-off-by: Jussi Maki <jussi@isovalent.com>
Support "*cidr.CIDR" and "[]*cidr.CIDR" as a config flag by parsing it from a string slice. For example: type MyConfig struct { MyCidrs []*cidr.CIDR } func (MyConfig) Flags(flags *pflag.FlagSet) { flags.StringSlice("my-cidrs", nil, "My CIDRs") } Signed-off-by: Jussi Maki <jussi@isovalent.com>
If the user specifies e.g. "--nodeport-addresses=10.0.0.0/8" then only addresses within that CIDR will be considered node's IP addresses. When set Cilium will serve NodePort on more than one IP address per network device for external traffic. Signed-off-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com> Refresh of datapath-add-table-nodeaddress
Signed-off-by: Jussi Maki <jussi@isovalent.com>
To keep the original semantics, get addresses from all devices instead of just the selected devices. Signed-off-by: Jussi Maki <jussi@isovalent.com>
To enforce that the device and route tables are populated before being read from, provide the Table[] from the DevicesController constructor. Signed-off-by: Jussi Maki <jussi@isovalent.com>
Provide and populate the devices table. Still TBD is to remove the fake node addressing implementation and instead use the real one. Signed-off-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
7722715
to
8b508dc
Compare
Commit 5082df2 does not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
Signed-off-by: Jussi Maki <jussi@isovalent.com>
Refresh of test-cleanup
As the rate limiting was not aborted when DB was being stopped this caused unnecessary waits in tests. Add a context that's canceled when DB is stopped and use that in the rate limiter. Note that we're on purpose not using jobs here to allow statedb to be used for example for the module health reporting and using jobs would cause dependency cycles. Signed-off-by: Jussi Maki <jussi@isovalent.com>
The anonymous embedding of netip.Addr will cause it to use its JSON marshalling implementation and thus misses the other fields. Signed-off-by: Jussi Maki <jussi@isovalent.com>
This implements support for remotely querying the contents of StateDB tables over the REST API. It is implemented in terms of raw queries, e.g. the table, index names and the key []byte is transferred to the server which then performs the query against the given table and index. This allows reuse of the Index[Obj, Key] types and thus provides the same experience as server side: table := statedb.NewRemoteTable[*tables.Device](client, "devices") // Query devices by revision, least recently changed objects first. iter, err := table.LowerBound(ctx, statedb.ByRevision[*tables.Device](0)) for device, revision, ok := iter.Next(); ok; device, revision, ok = iter.Next { ... } // Find devices by name: all devices that start with "e": iter, err = table.LowerBound(ctx, tables.DeviceNameIndex("e")) // Exact matches: iter, err = table.Get(ctx, tables.DeviceNameIndex("eth0")) if device, revision, ok := iter.Next(); ok { ... } On top of this API a generic utility is implemented in cilium-dbg/cmd/statedb.go to allow adding the ability to pretty-print the contents of any table whose object implements 'statedb.TabWritable': // "cilium-dbg statedb devices" statedbTableCommand[*tables.Device]("devices") Example output: root@kind-worker:/home/cilium# cilium statedb devices Name Index Selected Type MTU HWAddr Flags Addresses lxc4446c55b3291 8 false veth 1500 46:b9:9b:85:c2:e2 up|broadcast|multicast fe80::44b9:9bff:fe85:c2e2 lo 1 false device 65536 up|loopback 127.0.0.1, ::1 lxc_health 71 false veth 1500 06:d8:63:ff:20:3b up|broadcast|multicast fe80::4d8:63ff:feff:203b cilium_host 3 false veth 1500 66:e0:0e:19:a6:9f up|broadcast|multicast 10.244.1.186 Since go-swagger does not provide an easy way to implement proper streaming APIs this does not add support for delete events (DeleteTracker). Implementing this is more involved and would require registering a DeleteTracker and keeping it alive with some sort of keepalive mechanism. Signed-off-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
Now that fully qualified module ids are used we don't need to repeat the parent module id. Before: agent.datapath.datapath-tables.node-address OK 2m48s After: agent.datapath.tables.node-address OK 2m48s Signed-off-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
8b508dc
to
fc7f207
Compare
Commit 72db405 does not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
Closing as this is being pulled into multiple individual PRs. |
Add
Table[NodeAddress]
which derives from the low-levelTable[*Device]
to pick which IP addresses are considered host addresses and used for NodePort and BPF masquerading.Refactor NodeAddressing to look up IP addresses from the
Table[NodeAddress]
and K8s related information from LocalNodeStore.To add support for multiple NodePort IP addresses per network device, this PR adds the flag
--nodeport-addresses
thatallows user to specify which CIDRs the NodePort frontends can be picked from. This allows us to solve e.g. #24481.