Skip to content

Conversation

sayboras
Copy link
Member

@sayboras sayboras commented Oct 8, 2022

$ for pr in 21444 21588 21577 21497 21461 21463 21495 20520; do contrib/backporting/set-labels.py $pr done 1.12; done

aditighag and others added 12 commits October 8, 2022 20:48
[ upstream commit 22cf89a ]

The graceful termination feature relies on receiving
terminating endpoint events from Kubernetes, which are gated
by a flag. Reference the terminating endpoint metric in the
feature section so that users can track the metric to check
if the feature is working correctly, as a first step.

Signed-off-by: Aditi Ghag <aditi@cilium.io>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit 18ef607 ]

The troubleshooting guide explains packet drops seen due to
conntrack map fill-up issue. Reference some of the conntrack
garbage collection related metrics so that users get visibility
into conntrack gc runs - how often it's being triggered, how many
entries were gc'd, etc.

Signed-off-by: Aditi Ghag <aditi@cilium.io>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit 5b41102 ]

The mal-formed instance-type looks like below in CiliumNode spec:

```yaml
spec:
  ...
  alibaba-cloud:
    instance-type: |
      <?xml version="1.0" encoding="iso-8859-1"?>
      <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
               "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
      <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
       <head>
        <title>404 - Not Found</title>
       </head>
       <body>
        <h1>404 - Not Found</h1>
       </body>
      </html>
```

which prohibits cilium-operator from allocating PodIPs for this node.

Signed-off-by: ArthurChiao <arthurchiao@hotmail.com>
Reported-by: Jaff Cheng <jaff.cheng.sh@gmail.com>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit 5b40396 ]

These are the only 2 image fields that are currently not quoted. Some
users templatize image name / tag values outside of Helm. Not quoting
the image fields can cause issues if these values contain special
characters.

Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit 8ae1562 ]

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit 3fe7919 ]

This commit has no functional changes. It simply renames a few variables
in enableIPsec to make their relationships clearer.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit 91fdc20 ]

This commit adds two new arguments to UpsertIPsecEndpoint to specify the
outer source and destination IP address for IPsec. It doesn't include
any functional changes.

    - UpsertIPsecEndpoint(local, remote, fwd *net.IPNet, ...
    + UpsertIPsecEndpoint(local, remote, fwd *net.IPNet, outerLocal, outerRemote net.IP, ...

Until now, those two outer IP addresses were carried as part of the
first two CIDR arguments, `local` and `remote`. For example, `local`
would be equal to 192.168.56.11/24 where 192.168.56.0/24 would be used
to match packets in XFRM policies and 192.168.56.11 as the outer IP
address in XFRM states.

The outer IPs are now in separate arguments and the next commit will
change the local and remote arguments to not carry the IPs.

Why this change? Because in a subsequent commit, I will need the CIDR
and IP arguments to diverge. For example, we will have
 UpsertIPsecEndpoint calls with `local=0.0.0.0/0` and
`outerLocal=192.168.56.11`.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit 645da80 ]

The previous commit changed the UpsertIPsecEndpoint function as follows:

    - UpsertIPsecEndpoint(local, remote, fwd *net.IPNet, ...
    + UpsertIPsecEndpoint(local, remote, fwd *net.IPNet, outerLocal, outerRemote net.IP, ...

The first two CIDR arguments, `local` and `remote`, now don't need to carry
the outer IP addresses (moved to `outerLocal` and `outerRemote`). We can
therefore change calls to this function so that those two first
arguments carry only the CIDR (i.e., changed from e.g. 192.168.56.11/24
to 192.168.56.0/24). As a result, we also don't need to mask those two
arguments when we want only the CIDR part.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit b33adeb ]

gops and pprof do not use the same protocol to collect profile data.
Thus, the default port for pprof debug endpoints in `cilium-bugtool`
should not be the one used for gops, but the default one for pprof
itself.

Besides, clustermesh-apiserver does not support pprof yet, but only
gops. Thus, the help message for the pprof port option in cilium-bugtool
is fixed accordingly.

Fixes: #416319b1cd (bugtool: Default to the agent's gops port)

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit 5520be1 ]

submit-backport tried to create a backport PR with reviews from all
contributors whose fixes are being backported, including people who do
not have collaborator status in the repository. GitHub only allows
reviews to be assigned to collaborators, and thus rejected the review
assignments.

This commit changes submit-backport to filter the review assignments to
only include collaborators.

Fixes:  #21548
Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit 1019e70 ]

#21120 (comment)
uncovered that WaitUntilReady() is racy when using the DemoDaemonSet.
After adding an additional DaemonSet to this manifest, the number of
expected pods (as calculated per numExpectedPods()) no longer matches the
actually deployed pods. So WaitUntilReady() exits prematurely, and we
potentially access the pods' state before they are fully set up. This
explains the CI flakes we've observed in eg.
#21120.

Fix this by updating the internal manifest description of the demo_ds.yaml
manifest.

This allows WaitUntilReady() to wait for the correct number of pods to
become ready when using demo_ds. Also deleteInAnyNamespace() can clean up
_all_ daemon sets of this manifest.

Fixes: 74d9f56 ("egressgw: manager: add support for new egressGateway policy's property")
Fixes #21120
Reported-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit d45999d ]

The script originally just exited and assumed the kubelet would restart
it, but this is quite disruptive, since that restarts the agent as well.

Additionally, it was discovered that some AWS installations need the
service network to be functioning before the AWS CNI file is written,
leaving clusters in a broken state.

So, change the script to retry looking for the AWS CNI file itself.

Fixes: #21243

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.12 kind/backports This PR provides functionality previously merged into master. labels Oct 8, 2022
@sayboras
Copy link
Member Author

sayboras commented Oct 8, 2022

/test-backport-1.12

Job 'Cilium-PR-K8s-1.22-kernel-4.9' failed:

Click to show.

Test Name

K8sServicesTest Checks E/W loadbalancing (ClusterIP, NodePort from inside cluster, etc) Tests NodePort inside cluster (kube-proxy) 

Failure Output

FAIL: Request from k8s1 to service tftp://[fd04::12]:32432/hello failed

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.22-kernel-4.9 so I can create one.

Job 'Cilium-PR-K8s-1.22-kernel-4.9' failed:

Click to show.

Test Name

K8sFQDNTest Restart Cilium validate that FQDN is still working

Failure Output

FAIL: unable to retrieve all nodes with 'kubectl get nodes -o json | jq '.items | length'': Exitcode: -1 

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.22-kernel-4.9 so I can create one.

Job 'Cilium-PR-K8s-GKE' failed:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: Timed out after 61.390s.

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-GKE so I can create one.

@sayboras sayboras marked this pull request as ready for review October 8, 2022 10:21
@sayboras sayboras requested a review from a team as a code owner October 8, 2022 10:21
Copy link
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for PR 21497, thanks! 👍

@sayboras
Copy link
Member Author

sayboras commented Oct 9, 2022

/test-1.19-4.9

@sayboras
Copy link
Member Author

sayboras commented Oct 9, 2022

/test-1.22-4.9

@sayboras
Copy link
Member Author

sayboras commented Oct 9, 2022

/test-gke

Copy link
Contributor

@michi-covalent michi-covalent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#21463 ✅ thanks tam!

@sayboras
Copy link
Member Author

sayboras commented Oct 9, 2022

/test-1.22-4.9

@sayboras
Copy link
Member Author

sayboras commented Oct 9, 2022

/test-gke

Copy link
Member

@bimmlerd bimmlerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#21577 lgtm 💯

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My PR looks good. Thanks!

@sayboras
Copy link
Member Author

/test-gke

@sayboras
Copy link
Member Author

/test-1.22-4.9

@sayboras
Copy link
Member Author

/test-1.22-4.9

@sayboras
Copy link
Member Author

/test-gke

Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backport of my changes looks good. Thanks!

@sayboras
Copy link
Member Author

Reviews are in, test-gke is a known flake. Merged.

@sayboras sayboras merged commit 87eae0f into v1.12 Oct 10, 2022
@sayboras sayboras deleted the pr/v1.12-backport-2022-10-06-2 branch October 10, 2022 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants