Skip to content

Conversation

joamaki
Copy link
Contributor

@joamaki joamaki commented Oct 20, 2023

Add Table[NodeAddress] which derives from the low-level Table[*Device] to pick which IP addresses are considered host addresses and used for NodePort and BPF masquerading.

Refactor NodeAddressing to look up IP addresses from the Table[NodeAddress] and K8s related information from LocalNodeStore.

To add support for multiple NodePort IP addresses per network device, this PR adds the flag --nodeport-addresses that
allows user to specify which CIDRs the NodePort frontends can be picked from. This allows us to solve e.g. #24481.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 20, 2023
@joamaki
Copy link
Contributor Author

joamaki commented Oct 20, 2023

/test

@joamaki joamaki force-pushed the pr/joamaki/node-addressing-via-devices branch from abb3c08 to bf3cbee Compare October 23, 2023 06:13
@joamaki
Copy link
Contributor Author

joamaki commented Oct 24, 2023

The last commit that I just pushed now implements --nodeport-addresses=10.0.0.0/8,2001::/8 option for whitelisting which addresses are consider node's IP addresses. If this option is not set, the default behavior of only using the primary IP address (one with lowest scope) per device is kept.

While the impact of this is pretty clear for LoadBalancerNodeAddresses(), I'm not entirely sure yet what it means for LocalAddresses() which are used for the host identity. Prior to this, LocalAddresses() was the same set as LoadBalancerNodeAddresses(), with the exception of not having a "zero" address and including the "cilium_host" IP address. So in principle this could make sense.

Second thing to think about is whether this filtering should happen in NodeAddressing or whether the addresses should already be filtered out in Table[*Device]... Or whether we should have a separate table for the "node IP addresses" to separate this notion from Table[*Device] which is really meant to just mirror the kernel's link&addr state. The latter might be a good design since the idea anyway is to remove NodeAddressing to instead remove an unnecessary abstraction and encourage direct queries into the state and to be able to wait for the state to change. E.g.

type NodeAddr struct {
  Addr netip.Addr
  Device *Device // The device from which the address is inherited
}
var NodeAddrTable Table[*NodeAddr] = NewTable("node-addresses",  AddrIndex) 

func (f *featureFoo) reconcileNodeAddresses(db *DB, nodeAddrTable Table[*NodeAddr]) {
  for {
    addrs, watch := nodeAddrTable.All(db.ReadTxn())
    // reconcile the new node addresses
    for addr, _, ok := addrs.Next(); ok; addr, _, ok = addrs.Next() {
      // ...
    }
    // wait for node addresses to change again    
    <-watch 
  }
}

@joamaki joamaki force-pushed the pr/joamaki/node-addressing-via-devices branch 4 times, most recently from 08b4451 to 7e2ad17 Compare October 25, 2023 15:51
@joamaki
Copy link
Contributor Author

joamaki commented Oct 25, 2023

Ended up implementing the Table[NodeAddress]. This is essentially the same information as was stored in ipv*NodePortAddrs.

Redid LocalAddresses by extracting global addresses from all devices as the original code did with netlink.AddrList. It'd probably be a good idea to think whether it'd be possible to merge NodePortAddresses and LocalAddresses to simplify things overall, but hard to judge at this point what the impact would be, so this PR keeps things semantically the same.

I'll squash the changes down to couple commits once it's in a matured state. Right now there's lots of rewriting going on.

@joamaki joamaki force-pushed the pr/joamaki/node-addressing-via-devices branch 2 times, most recently from c0ce75b to 3c26ce6 Compare October 26, 2023 07:58
@joamaki joamaki force-pushed the pr/joamaki/node-addressing-via-devices branch from 3c26ce6 to 215fef7 Compare October 27, 2023 09:04
@maintainer-s-little-helper
Copy link

Commit 2320fdd does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Oct 27, 2023
@maintainer-s-little-helper
Copy link

Commit 2320fdd does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

1 similar comment
@maintainer-s-little-helper
Copy link

Commit 2320fdd does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@joamaki joamaki force-pushed the pr/joamaki/node-addressing-via-devices branch from 0f26561 to 17fd876 Compare October 30, 2023 15:43
@maintainer-s-little-helper
Copy link

Commit dcf18a8 does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@joamaki joamaki changed the title [DRAFT]: NodeAddressing via devices table [DRAFT]: NodeAddress table and refactored NodeAddressing Oct 30, 2023
@joamaki joamaki force-pushed the pr/joamaki/node-addressing-via-devices branch from 17fd876 to 7722715 Compare October 30, 2023 16:04
@maintainer-s-little-helper
Copy link

Commit dcf18a8 does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

Add a helper to construct only RWTable[T] and TableMeta to allow
creating a table in a way that enforces that the table is populated
by the producer before it can be read:

  var Cell = cell.Module("example", "Example module",
    // Privately provides RWTable[*Foo]:
    statedb.NewPrivateRWTableCell[*Foo]("foos", FooIDIndex),

    cell.Provide(New),
  )

  func New(lc hive.Lifecycle, t statedb.RWTable[*Foo]) (FooController, Table[*Foo]) {
    fooCtrl := ...
    lc.Append(fooCtrl) // fooCtrl now starts before anything that reads Table[*Foo]
    return fooCtl, t
  }

Signed-off-by: Jussi Maki <jussi@isovalent.com>
To enforce that the device and route tables are populated before being read
from, provide the Table[*Device] and Table[*Route[ from the DevicesController
constructor.  This makes sure that anything that depends on e.g. 'Table[*Device]'
will be constructed after DevicesController and thus the DevicesController start
hook will execute before it.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
As Linux does not send comprehensive set of route delete messages on device
deletion, flush the routes on the link delete message.

Since the link and route messages are coming over separate sockets
they may come out of order, so keep track of what link indexes have been deleted
and ignore route updates for deleted links. Linux does reuse the index after the
32 bits roll around, so remove the "dead link index" on link creation.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
Since subscribing to links/addresses/routes and then listing them is racy
we might end up seeing the same address in the initial listing and as an
update. Avoid adding the address twice by checking its existence first.

Side note: we need the initial listing to be able to populate the tables
before readers access them in order to keep the existing non-reconciling
semantics. The netlink library we're using does not have a mechanism to
inform that the initial listing is done in the subscriptions (even though
netlink DONE message is sent for this), so instead DevicesController does
subscribe + list which necessitates dealing with issues like this.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
In order to write the NodeAddressing's LocalAddresses() in terms of
Table[*Device], support the --local-max-addr-scope option to filter
out addresses by scope. By default only addresses with scope lower
than LINK are used.

Also filter out loopback addresses as was done by the earlier code.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
As a step towards removing the global variables in pkg/node
that back GetNodePortIPv4Addrs and friends, implement
NodeAddressing as queries against the devices table.

The big functional change this brings is that now all IP
addresses of selected external network devices are used
for NodePort frontends.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
The code is no longer specific to Linux as it is implemented
in terms of queries against the LocalNodeStore and the devices
table.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
Now that NodeAddressing is implemented on top of Table[*Device], the GetNodePort*Addrs*()
functions are no longer needed.

Adapt HeaderfileWriter to use NodeAddressing and add in DirectRouting() accessor to it
to look up the direct routing device ifindex and IP address.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
Support "*cidr.CIDR" and "[]*cidr.CIDR" as a config flag by parsing
it from a string slice.

For example:

  type MyConfig struct {
    MyCidrs []*cidr.CIDR
  }

  func (MyConfig) Flags(flags *pflag.FlagSet) {
    flags.StringSlice("my-cidrs", nil, "My CIDRs")
  }

Signed-off-by: Jussi Maki <jussi@isovalent.com>
If the user specifies e.g. "--nodeport-addresses=10.0.0.0/8" then only
addresses within that CIDR will be considered node's IP addresses. When
set Cilium will serve NodePort on more than one IP address per network device
for external traffic.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>

Refresh of datapath-add-table-nodeaddress
Signed-off-by: Jussi Maki <jussi@isovalent.com>
To keep the original semantics, get addresses from all devices
instead of just the selected devices.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
To enforce that the device and route tables are populated before being read
from, provide the Table[] from the DevicesController constructor.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
Provide and populate the devices table. Still TBD is to remove
the fake node addressing implementation and instead use the real
one.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
@joamaki joamaki force-pushed the pr/joamaki/node-addressing-via-devices branch from 7722715 to 8b508dc Compare November 1, 2023 07:16
@maintainer-s-little-helper
Copy link

Commit 5082df2 does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

Signed-off-by: Jussi Maki <jussi@isovalent.com>
As the rate limiting was not aborted when DB was being stopped this
caused unnecessary waits in tests. Add a context that's canceled when
DB is stopped and use that in the rate limiter.

Note that we're on purpose not using jobs here to allow statedb to be used
for example for the module health reporting and using jobs would cause dependency
cycles.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
The anonymous embedding of netip.Addr will cause it to use its
JSON marshalling implementation and thus misses the other fields.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
This implements support for remotely querying the contents of StateDB tables
over the REST API. It is implemented in terms of raw queries, e.g. the table,
index names and the key []byte is transferred to the server which then performs
the query against the given table and index. This allows reuse of the Index[Obj, Key]
types and thus provides the same experience as server side:

    table := statedb.NewRemoteTable[*tables.Device](client, "devices")

    // Query devices by revision, least recently changed objects first.
    iter, err := table.LowerBound(ctx, statedb.ByRevision[*tables.Device](0))
    for device, revision, ok := iter.Next(); ok; device, revision, ok = iter.Next { ... }

    // Find devices by name: all devices that start with "e":
    iter, err = table.LowerBound(ctx, tables.DeviceNameIndex("e"))

    // Exact matches:
    iter, err = table.Get(ctx, tables.DeviceNameIndex("eth0"))
    if device, revision, ok := iter.Next(); ok { ... }

On top of this API a generic utility is implemented in cilium-dbg/cmd/statedb.go
to allow adding the ability to pretty-print the contents of any table whose object
implements 'statedb.TabWritable':

	// "cilium-dbg statedb devices"
	statedbTableCommand[*tables.Device]("devices")

Example output:

  root@kind-worker:/home/cilium# cilium statedb devices
  Name              Index   Selected   Type     MTU     HWAddr              Flags                    Addresses
  lxc4446c55b3291   8       false      veth     1500    46:b9:9b:85:c2:e2   up|broadcast|multicast   fe80::44b9:9bff:fe85:c2e2
  lo                1       false      device   65536                       up|loopback              127.0.0.1, ::1
  lxc_health        71      false      veth     1500    06:d8:63:ff:20:3b   up|broadcast|multicast   fe80::4d8:63ff:feff:203b
  cilium_host       3       false      veth     1500    66:e0:0e:19:a6:9f   up|broadcast|multicast   10.244.1.186

Since go-swagger does not provide an easy way to implement proper streaming APIs
this does not add support for delete events (DeleteTracker). Implementing this
is more involved and would require registering a DeleteTracker and keeping it alive
with some sort of keepalive mechanism.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
Now that fully qualified module ids are used we don't need to repeat
the parent module id.

Before:
  agent.datapath.datapath-tables.node-address   OK          2m48s

After:
  agent.datapath.tables.node-address   OK          2m48s

Signed-off-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
@joamaki joamaki force-pushed the pr/joamaki/node-addressing-via-devices branch from 8b508dc to fc7f207 Compare November 1, 2023 14:24
@maintainer-s-little-helper
Copy link

Commit 72db405 does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@joamaki
Copy link
Contributor Author

joamaki commented Nov 7, 2023

Closing as this is being pulled into multiple individual PRs.

@joamaki joamaki closed this Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. dont-merge/needs-sign-off The author needs to add signoff to their commits before merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant