Skip to content

Conversation

rectified95
Copy link
Owner

Please ensure your pull request adheres to the following guidelines:

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • If your commit description contains a Fixes: <commit-id> tag, then
    please add the commit author[s] as reviewer[s] to this issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Are you a user of Cilium? Please add yourself to the Users doc
  • Thanks for contributing!

Fixes: #issue-number

<!-- Enter the release note text here if needed or remove this section! -->

chancez and others added 8 commits November 8, 2024 07:56
Additionally, to explain what's running because it makes it easier to
read the GH action logs and determine what's happening.

Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com>
…nfig

In the handler of case svcFrontendNamedPorts, the "ports" is allocated
and assigned with values, but never used. Thus, delete the related code.

Fixes: e7bb8a7 ("k8s/cilium Event handlers and processing logic for LRPs")

Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Signed-off-by: Mengxin Liu <mengxin@alauda.io>
The make target is more discoverable and easier to use, and we can
switch to the 'slim' image while we're at it which will speed up the
time to run the linter especially on slow Internet connections.

Suggested-by: Chance Zibolski <chance.zibolski@gmail.com>
Signed-off-by: Joe Stringer <joe@cilium.io>
This code seems to access a variable without holding a lock, when that
variable can be modified at runtime by other logic. Add locking to avoid
weird and wonderful race conditions.

Found by code inspection.

Signed-off-by: Joe Stringer <joe@cilium.io>
When SetBackends received an empty slice [bes] of BackendParams,
it failed to refresh frontends of the service specified by [name]. This
omission is now fixed and a test for SetBackends is added which in
particular covers this issue.

Fixes: 485331f
Signed-off-by: Damian Sawicki <dsawicki@google.com>
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
- Split the workflow into separate jobs
- Use the chart version script from the default branch
- Remove the use of branch names in chart tags

Signed-off-by: Feroz Salam <feroz.salam@isovalent.com>
Co-authored-by: André Martins <aanm@users.noreply.github.com>
@rectified95 rectified95 force-pushed the post_rebase branch 2 times, most recently from a09ddb5 to 734d5a8 Compare November 11, 2024 23:29
viktor-kurchenko and others added 20 commits November 12, 2024 08:15
Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
When running in kvstore mode, Cilium currently still starts the CiliumNode
and CiliumEndpoint informers, stopping them once connecting to etcd. Informers
are then potentially restarted if the client disconnects from etcd again. This
handover logic, which has been historically introduced to support running etcd
in pod network, is known to be complex and fragile (as it can potentially lead
to stale entries), and introduces extra load on the Kubernetes API Server
during Cilium agent startup.

Considering that this handover logic is not needed when etcd runs outside of
the cluster (or in host network) and support for running etcd in pod network
has been officially deprecated in v1.16 [1], let's disable it altogether.
Still, let's keep it behind a hidden flag for the moment, to allow easily
enabling it again in case of problems. If nothing unexpected comes up, we
can then definitely remove it in the future.

The same flag is used in the context of the pod watcher as well, to avoid
switching to watching all the pods in the cluster (rather than only the
ones hosted by the current node) when the CiliumEndpoint CRD is disabled.
Indeed, this was again needed to support running etcd in pod network, but
completely defeats the performance benefits associated with disabling the
CiliumEndpoint CRD.

[1]: f99f10b ("docs: Deprecate support for podnetwork etcd")

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
When Cilium is configured in KVstore mode, and support for running the
kvstore in pod network is disabled, we don't start the CiliumEndpoints
informer at all. Hence, let's disable this GC logic as well, given that
it would otherwise need to start it to populate the store content.
Indeed, no one is expected to be watching them, and we can accept the
possibility that we leak a few objects in very specific and rare
circumstances [1], until the corresponding pod gets deleted. The respective
kvstore entries, which are not taken into account here, will be instead
eventually deleted when the corresponding lease expires.

Even better, we could disable the CiliumEndpoint CRD altogether in this
configuration, given that CiliumEndpoints are not expected to be used at
all. However, that comes with extra risks, especially during the initial
migration phase, when old agents may still watch them in case of restarts,
and given that the feature is not really battle tested. Hence, let's go
for this simpler, and safer, approach for the moment.

[1]: cilium#20350

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Modify a few matrix entries of the Cilium E2E Upgrade workflow to
configure Cilium in kvstore mode, to cover this functionality here
as well in addition to the clustermesh workflows. In detail, the
etcd instance is executed as a pod running in host network, which
is setup via the `kind-kvstore-start` makefile target. The matrix
entries are selected trying to cover the most common combinations,
that is native-routing/tunneling, KPR off/on and wireguard off/on,
and avoiding incompatible options (mainly Egress Gateway).

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Modify a few matrix entries of the Conformance IPSec and IPSec Upgrade
workflows to configure Cilium in kvstore mode, to cover this functionality
here as well in addition to the E2E and clustermesh workflows. In detail,
the etcd instance is executed as a pod running in host network, which
is setup via the `kind-kvstore-start` makefile target. The matrix entries
are selected to cover both native routing and tunneling, while avoiding
incompatible options (mainly Egress Gateway and Mutual Auth).

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
This trigger may be, well, triggered when reacting to a DNS request --
even when the agent is starting up. This could lead to a deadlock, as
the datapath is not able to write the endpoint header file until the
agent is started, but the agent cannot finish starting as the endpoint
is locked.

The fix for this is to remove the unnecessary trigger initialization on
endpoint parsing; we will always start it on first regeneration.

This catches a case missed in cilium#34059, which only fixed the new-endpoint
case.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Mark policymap entries not valid explicitly when policy map entry is not
valid so that map sync will update the entry, instead just ignoring the
key, as in that case the key would remain in the map if it would have
needed to be removed instead.

Fixes: cilium#35534

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Move proxy port restoration to the cell just before the trigger for
writing the proxy ports file is created. This removes the possibility
that the trigger would run before the ports are restored.

Add a configurable time limit ('--restored-proxy-ports-age-limit',
default 15 minutes) for the age of the restored proxy ports file to
remove the chance that stale port numbers would be used e.g., on upgrade
after a downgrade to Cilium 1.15 (which does not store the proxy ports to
a file).

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
This commit wraps the proxy ports checkpoint function in a controller. This
has two effects:

  1. If the checkpoint fails due to I/O errors, the checkpoint is
     retried until it succeeds. Previously, we relied on it being
     triggered again for it to be re-ran (though so far we have not
     observed any I/O failures in CI, so this likely was not an issue)
  2. By using a controller `StopFunc` and calling `RemoveControllerAndWait`
     we ensure the checkpoint function is ran one last time during
     shutdown. This is important, since `Trigger.Shutdown` by itself does not
     run any pending triggers. Note that any errors during shutdown are
     only logged and the controller is not re-tried.

The proxy ports trigger is closed when the proxy cell is
stopped, which happens e.g. when SIGTERM is received.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
For this to work, we also need the boot ID to look more realistic (that
is, at least to have the correct size).

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
The cluster-pool and Kubernetes IPAM modes are already tested in
IPv4-only mode in linuxPrivilegedBaseTestSuite.TestNodeChurnXFRMLeaks(),
there's no need to test it again in the IPv4-only
TestNodeChurnXFRMLeaks() test. This commit removes that double testing.

Fixes: a0f4a32 ("datapath: Fix TestNodeChurnXFRMLeaks")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
This test was only run in IPv4-only mode. This commit extends it to also
cover IPv6-only and dual stack modes.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
The XFRM leak test for subnet mode doesn't work in dual stack mode. In
that mode, it fails with:

    unable to replace remote state: file exists

because we try to add XFRM OUT states for both IPv4 and IPv6 with the
same mark (expected) and destination IP (unexpected). The destination IP
addresses are indeed 0/0 in both cases. That is not expected for XFRM
states.

The issue is related to the use-cilium-internal-ip-for-ipsec agent flag.
Depending on this flag, we use either the CiliumInternalIPs or the
NodeInternalIPs as the IPsec tunnel endpoints. It therefore impacts the
IP addresses we use for the XFRM states as well (they're the same thing).
Since this flag is set to its default value (false) in tests, we use the
NodeInternalIP. Those are however not defined in tests so we end up with
the 0/0 address instead and the conflict between the IPv4 and IPv6
states.

This commit fixes it by defining those NodeInternalIPs.

Fixes: 9207b78 ("datapath: Cover subnet encryption in XFRM leak test")
Fixes: a0f4a32 ("datapath: Fix TestNodeChurnXFRMLeaks")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
When setting logging up in such a way that logrus would attempt to
serialise entries to JSON, it could fail with errors such as:

    Failed to obtain reader, failed to marshal fields to JSON, json: unsupported type: map[types.Identity]*types.Node

Replace the broken Info log with logging just the number of deleted
nodes and prepare a serializable slice of strings for the debug log with
more detail.

Fixes: b855b25 (node/manager: synthesize node deletion events)

Reported-by: Simon Dickhoven <sdickhoven@everquote.com>
Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
Don't emit a spurious warning in case remote cluster configuration
retrieval gets aborted due to the parent context being canceled
(e.g., due to reconnecting to etcd), as already logged elsewhere,
and potentially misleading to the users.

Reported-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Currently, a warning log gets always emitted in case the list operation
to start a watcher fails, even though it will automatically retried.
Given that we are starting to flag warning logs in CI, and this can
briefly occur in the clustermesh context during the initialization phase
(especially when the authorization mode is configured to cluster, until
permissions are granted), let's lower the severity for the first few
occurrences, and emit a warning only if the situation does not improve.

Reported-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Explicitly configure the clustermesh-sync-timeout parameter to a higher
value to prevent emitting spurious warning logs during the initial
connection phase when the authorization mode is configured to cluster.
Indeed, Cilium agents get restarted at that point (due to the
configuration of host aliases), while the clustermesh-apiserver does
not (assuming that KVStoreMesh is disabled), potentially requiring up
to one minute to react to the configmap change and create the associated
user in etcd. Still, the warning in this case would have been benign,
because no cross-cluster connections could be present, as we are meshing
the two clusters for the first time right now.

Reported-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Relates: cilium#35370
Signed-off-by: Tam Mach <tam.mach@cilium.io>
Add metrics to track the maximum cardinality of selectors actively used
to select peers in the cluster. This can be useful as a hint as to the
use of policy rules that are overly permissive, leading to high policy
map usage.

Example with some sample policies + usage:

  # cilium-dbg policy selectors
  SELECTOR                                                                                                                                                                       LABELS                              USERS   IDENTITIES
  &LabelSelector{...}   default/from-world-to-role-public   2       63484
  &LabelSelector{...}   default/fqdn                        1       39546
  &LabelSelector{...}   default/rule1                       1       4992
                                                                    39546
                                                                    63484
  &LabelSelector{...}   default/fqdn                        3       45238
  &LabelSelector{...}   default/fqdn                        1
  &LabelSelector{...}   default/from-world-to-role-public   1       9
                                                                    16777217
  &LabelSelector{...}   default/from-world-to-role-public   1       10
  &LabelSelector{...}   default/from-world-to-role-public   1       2
  api.github.com ...    default/fqdn                        1       16777217
  # cilium-dbg metrics list | grep selector
  cilium_fqdn_selectors                                     1.000000
  cilium_policy_selector_match_count_max   class="cluster"  0.000000
  cilium_policy_selector_match_count_max   class="fqdn"     1.000000
  cilium_policy_selector_match_count_max   class="other"    3.000000
  cilium_policy_selector_match_count_max   class="world"    2.000000

The cilium_fqdn_selectors metric counts the total number of fqdn
selectors altogether, but the cilium_policy_selector_match_count_max is
the maximum number of identities selected by any selector in the
specified class.

For now to avoid high cardinality, there are four categories:
- World: Equivalent to entity selector on world
- Cluster: Equivalent to entity selector on the cluster
- FQDN: Any selector that is based on an FQDNs statement
- Other: All other selectors.

"All" is notably omitted, because the Cilium dataplane already optimizes
such statements to minimize cardinality when encoding the identities into
policy maps.

Signed-off-by: Joe Stringer <joe@cilium.io>
No functional changes in this commit, just moving code.

Signed-off-by: Joe Stringer <joe@cilium.io>
aanm and others added 25 commits November 20, 2024 16:51
Track which Internet Protocol modes are enabled in Cilium.

Signed-off-by: André Martins <andre@cilium.io>
Track which identity allocation mode is enabled in Cilium.

Signed-off-by: André Martins <andre@cilium.io>
Track if CiliumEndpointSlices is enabled in Cilium.

Signed-off-by: André Martins <andre@cilium.io>
Track which device mode is enabled in Cilium.

Signed-off-by: André Martins <andre@cilium.io>
This commit adds the ability to generate a report for all nodes running
in the cluster with the features enabled on the agents and generate
reports from those metrics.

Signed-off-by: André Martins <andre@cilium.io>
Enable feature tracking on almost all CI workflows and generate a
markdown table with such report.

Signed-off-by: André Martins <andre@cilium.io>
Before this change, marshalling policy to a struct caused the line
`... "enableDefaultDeny":{}, ...` to be written to json-marshalled
network policy. This later caused problems when using cilium-cli with
cluster versions that do not support the `enableDefaultDeny` field.

This can be replaced with a `omitzero` modifier on the json struct tag
once go1.24 is released. Until then, we can work around this with a
simple pointer instead.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Prior to this change, using `hubble --version` and `hubble version`
would produce different results:

    hubble --version # v1.16.3
    hubble version   # 1.16.3

Ref: cilium/hubble#1613

Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Alvaro Muñoz <pwntester@github.com>
The attribute tls.secretSync.secretsNamespace.enabled is not used, and
seems to be covered by tls.secretSync.enabled instead.

Fixes: 9d6cdfc

Signed-off-by: Tam Mach <tam.mach@cilium.io>
Status reporting (status.conditions in the ClusterConfig and PeerConfig
or status in the NodeConfig) are useful for troubleshooting. Thus, it is
recommended to enable it. However, if there's an issue like high API
server load, we may want to disable it.

This commit introduces a Helm value to control it. The default is
enabled. We'll introduce an actual agent/operator options in the
subsequent commits.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Add a new agent/operator option enable-bgp-control-plane-status-report
that enables/disables status reporting (status.conditions on the
CiliumBGPClusterConfig and CiliumBGPPeerConfig, status on the
CiliumBGPNodeConfig). It is enabled by default, but users can disable it
to reduce the API server load.

So far, we only provide a knob to enable/disable all status reporting.
In the future, we may want to provide more fine-grained knob such as
enabling/disabling status reporting per resource. However, as a starting
point, we'll provide it with the simplest way.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
When users disable the status reporting on the running cluster, the
previously reported status won't be updated anymore. That means the
stale status report persists. To avoid this, cleanup the status fields
when the status reporting is disabled.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
This commit fixes the issue where stale backends are associated with
the LRP service after the agent restarts.

Cilium restores the service and backend cache from the BPF map
and synchronizes it with the Kubernetes API server, assuming that
UpsertService is called for each active service. During the sync period,
Cilium keeps a list of restored backends that haven't been observed
for each service to prevent temporary connectivity loss during agent
restarts. (See commit 920976a.)

After synchronization, an update is triggered for each service still
associated with stale backends, allowing them to be removed. However,
LRP services are not updated and remain associated with stale backends
because the ServiceCache cannot update LRP services. Instead, the LRP
manager is responsible for updating them.

This issue arises if a CLRP is created during an agent restart.
For example, consider a scenario where the following nodelocaldns CLRP
is applied during agent startup:

1) Cilium restores the kube-dns ClusterIP service and its backends (coredns)
   from the BPF map and synchronizes them with Kubernetes.
2) If the LRP manager calls UpsertService first, it retains coredns, adds
   node-local-dns as a backend, and updates the kube-dns service to
   an LRP-type service.
3) After synchronization, updates are triggered for all services. However,
   the LRP service is not updated, leaving stale backends associated with it.

To address this issue, this commit ensures that the LRP manager calls
EnsureService to remove stale backends.

apiVersion: "cilium.io/v2"
kind: CiliumLocalRedirectPolicy
metadata:
  name: "nodelocaldns"
  namespace: kube-system
spec:
  redirectFrontend:
    serviceMatcher:
      serviceName: kube-dns
      namespace: kube-system
  redirectBackend:
    localEndpointSelector:
      matchLabels:
        k8s-app: node-local-dns
    toPorts:
      - port: "53"
        name: dns
        protocol: UDP
      - port: "53"
        name: dns-tcp
        protocol: TCP

Signed-off-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
**Description**
As part of the network policy modularization and decoupling of the network policy enforcement system from the endpoint, the no-op allocator aims to provide a possibility of disabling the identity allocation system when network policies are disabled.

**Motivation**
The scalability of network policy feature is limited when the scalability dimensions, # of nodes and pod churn rate, are stretched. Cilium has many networking features that scale better, or are unaffected by these scalability dimensions. Currently, disabling network policies to reach higher scale for other features is the only feasible solution.

Note: The plan is to improve scalability of network policies, so that it can reach higher scales that are required.

**Follow-up**
Subsequently, the plan is to agree on a configuration set that would determine if the identity allocation system should be disabled, and put the no-op allocator to use.

Proposed config:
- EnablePolicy=never
- DisableCiliumEndpointCRD=true
- EnableK8sNetworkPolicy=false
- EnableCiliumNetworkPolicy=false
- EnableCiliumClusterwideNetworkPolicy=false

Signed-off-by: Dorde Lapcevic <dordel@google.com>
We check for error logs in both the CLI's connectivity tests and the
ginkgo tests. Warning logs are however only checked by ginkgo tests
since commit 551f435 ("test: Fail on level=warning logs").

This commit extends the CLI connectivity tests to also check for warning
logs, with the same exception list as in ginkgo tests.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
The "Unable to translate all CIDR groups to CIDRs" warning was removed
in main, but it's still happening when the upgrade test runs v1.16. We
can therefore remove that exception once the upgrade test covers
v1.17<>v1.18.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
When deleting old XFRM policies (ex. when disabling IPsec), we try to
skip policies not installed by Cilium, just in case someone else has the
good idea of using XFRM. We do so in function isXfrmPolicyCilium with
various rules.

Unfortunately, our XFRM IN policies changed in f108c0c ("ipsec:
Simplify XFRM IN policies") and isXfrmPolicyCilium doesn't work anymore.
So we need to adapt it to recognize the new XFRM IN policy.

This bug resulted in Cilium never cleaning up the XFRM IN policy, which
would cause one of our IPsec integration tests to always pass.

Fixes: f108c0c ("ipsec: Simplify XFRM IN policies")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Tam Mach <tam.mach@cilium.io>
To allow filling in data from other tables during reflection, pass
the current transaction to the transform functions.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
Add ServicePort to Frontend to allow proxy redirection to match against
NodePort frontends based on the service port (ClusterIP port).

Add ports to the ProxyRedirect in Service to allow applying the redirect
to only frontends matching the port.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
Implement CiliumEnvoyConfig handling and backend syncing for the
experimental LB control-plane.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
Sets up new ConfigMap "cilium-dynamic-metrics-config" and new Helm options under ".Values.hubble.metrics.dynamic".

Signed-off-by: Igor Klemenski <igor.klemenski@microsoft.com>
Adds new Hubble flag "hubble-dynamic-metrics-config-path".

Signed-off-by: Igor Klemenski <igor.klemenski@microsoft.com>
Adds new logic in Hubble for dynamic metric registration.
Adds new methods to the Handler interface: HandleConfigurationUpdate, Deinit.

Signed-off-by: Igor Klemenski <igor.klemenski@microsoft.com>
Pass filters to Hubble metric handlers' Init method.

Signed-off-by: Igor Klemenski <igor.klemenski@microsoft.com>
Fork metric setup logic into static and dynamic paths during Hubble launch.
Split handler bulk creation methods into single iterators.
Move ProcessFlow method into the Handler interface.

Signed-off-by: Igor Klemenski <igor.klemenski@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.