Skip to content

CI: K8sDatapathConfig MonitorAggregation Checks that monitor aggregation restricts notifications #27118

@maintainer-s-little-helper

Description

@maintainer-s-little-helper

Test Name

K8sDatapathConfig MonitorAggregation Checks that monitor aggregation restricts notifications

Failure Output

FAIL: Unable to restart unmanaged pods with 'kubectl -n kube-system delete pods coredns-7c74c644b-rxsz6': Exitcode: 1 

Stacktrace

Click to show.
/home/jenkins/workspace/Cilium-PR-K8s-1.20-kernel-4.9/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:527
Unable to restart unmanaged pods with 'kubectl -n kube-system delete pods coredns-7c74c644b-rxsz6': Exitcode: 1 
Err: exit status 1
Stdout:
 	 pod "coredns-7c74c644b-rxsz6" deleted
	 
Stderr:
 	 Error from server: etcdserver: request timed out
	 

/home/jenkins/workspace/Cilium-PR-K8s-1.20-kernel-4.9/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:647

Standard Output

Click to show.
Number of "context deadline exceeded" in logs: 0
Number of "level=error" in logs: 0
Number of "level=warning" in logs: 0
Number of "Cilium API handler panicked" in logs: 0
Number of "Goroutine took lock for more than" in logs: 0
No errors/warnings found in logs
Number of "context deadline exceeded" in logs: 2
Number of "level=error" in logs: 2
Number of "level=warning" in logs: 2
Number of "Cilium API handler panicked" in logs: 0
Number of "Goroutine took lock for more than" in logs: 0
Top 3 errors/warnings:
Network status error received, restarting client connections
error retrieving resource lock kube-system/cilium-operator-resource-lock: Get \
Failed to release lock: resource name may not be empty
Number of "context deadline exceeded" in logs: 0
Number of "level=error" in logs: 0
⚠️  Number of "level=warning" in logs: 6
Number of "Cilium API handler panicked" in logs: 0
Number of "Goroutine took lock for more than" in logs: 0
Top 3 errors/warnings:
Unable to get node resource
Waiting for k8s node information
Key allocation attempt failed
Cilium pods: [cilium-gct59 cilium-tlk79]
Netpols loaded: 
CiliumNetworkPolicies loaded: 
Endpoint Policy Enforcement:
Pod                           Ingress   Egress
grafana-d69c97b9b-hl4n7       false     false
prometheus-655fb888d7-6tbb8   false     false
Cilium agent 'cilium-gct59': Status: Ok  Health: Ok Nodes "" ContainerRuntime:  Kubernetes: Ok KVstore: Ok Controllers: Total 26 Failed 0
Cilium agent 'cilium-tlk79': Status: Ok  Health: Ok Nodes "" ContainerRuntime:  Kubernetes: Ok KVstore: Ok Controllers: Total 18 Failed 0


Standard Error

Click to show.
21:18:04 STEP: Running BeforeAll block for EntireTestsuite
21:18:04 STEP: Starting tests: command line parameters: {Reprovision:false HoldEnvironment:false PassCLIEnvironment:true SSHConfig: ShowCommands:false TestScope: SkipLogGathering:false CiliumImage:quay.io/cilium/cilium-ci CiliumTag:1e57744c70369599108bbe789caae64648de0d0b CiliumOperatorImage:quay.io/cilium/operator CiliumOperatorTag:1e57744c70369599108bbe789caae64648de0d0b CiliumOperatorSuffix:-ci HubbleRelayImage:quay.io/cilium/hubble-relay-ci HubbleRelayTag:1e57744c70369599108bbe789caae64648de0d0b ProvisionK8s:true Timeout:2h50m0s Kubeconfig:/home/jenkins/workspace/Cilium-PR-K8s-1.20-kernel-4.9/src/github.com/cilium/cilium/test/vagrant-kubeconfig KubectlPath:/tmp/kubectl RegistryCredentials: Multinode:true RunQuarantined:false Help:false} environment variables: [JENKINS_HOME=/var/jenkins_home ghprbSourceBranch=meyskens/112-ipcache-meta-delete ghprbTriggerAuthorEmail=maartje@eyskens.me VM_MEMORY=8192 MAIL=/var/mail/root SSH_CLIENT=52.25.14.27 52524 22 ghprbPullAuthorEmail=maartje@eyskens.me USER=root PROJ_PATH=src/github.com/cilium/cilium RUN_CHANGES_DISPLAY_URL=https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.9/113/display/redirect?page=changes ghprbPullDescription=GitHub pull request #26958 of commit 1e57744c70369599108bbe789caae64648de0d0b, no merge conflicts. NETNEXT=0 ghprbActualCommit=1e57744c70369599108bbe789caae64648de0d0b SHLVL=1 CILIUM_TAG=1e57744c70369599108bbe789caae64648de0d0b NODE_LABELS=baremetal ginkgo nightly node-calm-loon vagrant HUDSON_URL=https://jenkins.cilium.io/ GIT_COMMIT=0e23f524e0f906d41ade7c0c72e1e9dc62f25edf OLDPWD=/home/jenkins/workspace/Cilium-PR-K8s-1.20-kernel-4.9 GINKGO_TIMEOUT=170m HOME=/home/jenkins/workspace/Cilium-PR-K8s-1.20-kernel-4.9 ghprbTriggerAuthorLoginMention=@meyskens BUILD_URL=https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.9/113/ ghprbPullAuthorLoginMention=@meyskens HUDSON_COOKIE=54312723-b808-45bc-8b5f-b5d16660203e JENKINS_SERVER_COOKIE=durable-58f206a8c9615f5eb8aa6d55008bb9eb ghprbGhRepository=cilium/cilium DOCKER_TAG=1e57744c70369599108bbe789caae64648de0d0b JobKernelVersion=49 DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/0/bus KERNEL=49 CONTAINER_RUNTIME=docker WORKSPACE=/home/jenkins/workspace/Cilium-PR-K8s-1.20-kernel-4.9 ghprbPullLongDescription=This adds a delete call to the metadata map of the ipcache. On a delete from ipcache this info was left bhind. In case a node gets deleted the IP would keep receiving the remote-host label from this cache incorrectly.\r\n\r\nThis issue came out of an issue where we saw EKS nodes being recycled and the IPs getting re-used and receiving an incorrect remote-host label in identity cache and ip cache.\r\n\r\n```release-note\r\nDelete IP Label metadata on delete from ipcache\r\n```\r\n\r\n- v1.12: this PR\r\n- v1.13: https://github.com/cilium/cilium/pull/27010\r\n- v1.14: not affected thanks to a refactor :) \r\n K8S_NODES=2 TESTDIR=/home/jenkins/workspace/Cilium-PR-K8s-1.20-kernel-4.9/src/github.com/cilium/cilium/test LOGNAME=root NODE_NAME=node-calm-loon ghprbCredentialsId=ciliumbot _=/usr/bin/java HUBBLE_RELAY_IMAGE=quay.io/cilium/hubble-relay-ci STAGE_NAME=BDD-Test-PR GIT_BRANCH=origin/pr/26958/merge EXECUTOR_NUMBER=0 ghprbTriggerAuthorLogin=meyskens TERM=xterm XDG_SESSION_ID=4 HOST_FIREWALL=0 CILIUM_OPERATOR_TAG=1e57744c70369599108bbe789caae64648de0d0b BUILD_DISPLAY_NAME=[v1.12] Delete IP Label metadata on delete from ipcache  https://github.com/cilium/cilium/pull/26958  #113 ghprbPullAuthorLogin=meyskens HUDSON_HOME=/var/jenkins_home ghprbTriggerAuthor=Maartje Eyskens JOB_BASE_NAME=Cilium-PR-K8s-1.20-kernel-4.9 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/local/go/bin:/root/go/bin sha1=origin/pr/26958/merge KUBECONFIG=/home/jenkins/workspace/Cilium-PR-K8s-1.20-kernel-4.9/src/github.com/cilium/cilium/test/vagrant-kubeconfig FOCUS=K8s BUILD_ID=113 XDG_RUNTIME_DIR=/run/user/0 BUILD_TAG=jenkins-Cilium-PR-K8s-1.20-kernel-4.9-113 RUN_QUARANTINED=false CILIUM_IMAGE=quay.io/cilium/cilium-ci JENKINS_URL=https://jenkins.cilium.io/ LANG=C.UTF-8 ghprbCommentBody=/test-backport-1.12 JOB_URL=https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.9/ ghprbPullTitle=[v1.12] Delete IP Label metadata on delete from ipcache GIT_URL=https://github.com/cilium/cilium ghprbPullLink=https://github.com/cilium/cilium/pull/26958 BUILD_NUMBER=113 JENKINS_NODE_COOKIE=f8546b80-d9c6-4b1e-8938-230d2e3c2640 SHELL=/bin/bash GOPATH=/home/jenkins/workspace/Cilium-PR-K8s-1.20-kernel-4.9 RUN_DISPLAY_URL=https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.9/113/display/redirect IMAGE_REGISTRY=quay.io/cilium ghprbAuthorRepoGitUrl=https://github.com/meyskens/cilium.git FAILFAST=false HUDSON_SERVER_COOKIE=693c250bfb7e85bf ghprbTargetBranch=v1.12 JOB_DISPLAY_URL=https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.9/display/redirect K8S_VERSION=1.20 JOB_NAME=Cilium-PR-K8s-1.20-kernel-4.9 SSH_CONNECTION=52.25.14.27 52524 139.178.83.129 22 ghprbPullId=26958 CILIUM_OPERATOR_IMAGE=quay.io/cilium/operator HUBBLE_RELAY_TAG=1e57744c70369599108bbe789caae64648de0d0b JobK8sVersion=1.20 VM_CPUS=3 PWD=/home/jenkins/workspace/Cilium-PR-K8s-1.20-kernel-4.9/src/github.com/cilium/cilium/test CILIUM_OPERATOR_SUFFIX=-ci]
21:18:04 STEP: Ensuring the namespace kube-system exists
21:18:04 STEP: WaitforPods(namespace="kube-system", filter="-l k8s-app=cilium-test-logs")
21:18:06 STEP: WaitforPods(namespace="kube-system", filter="-l k8s-app=cilium-test-logs") => <nil>
21:18:06 STEP: Preparing cluster
21:18:07 STEP: Labelling nodes
21:18:07 STEP: Cleaning up Cilium components
21:18:07 STEP: Running BeforeAll block for EntireTestsuite K8sDatapathConfig
21:18:07 STEP: Ensuring the namespace kube-system exists
21:18:07 STEP: WaitforPods(namespace="kube-system", filter="-l k8s-app=cilium-test-logs")
21:18:07 STEP: WaitforPods(namespace="kube-system", filter="-l k8s-app=cilium-test-logs") => <nil>
21:18:07 STEP: Installing Cilium
21:18:08 STEP: Waiting for Cilium to become ready
21:18:47 STEP: Restarting unmanaged pods coredns-7c74c644b-rxsz6 in namespace kube-system
FAIL: Unable to restart unmanaged pods with 'kubectl -n kube-system delete pods coredns-7c74c644b-rxsz6': Exitcode: 1 
Err: exit status 1
Stdout:
 	 pod "coredns-7c74c644b-rxsz6" deleted
	 
Stderr:
 	 Error from server: etcdserver: request timed out
	 

=== Test Finished at 2023-07-26T21:18:54Z====
21:18:54 STEP: Running JustAfterEach block for EntireTestsuite K8sDatapathConfig
===================== TEST FAILED =====================
21:19:01 STEP: Running AfterFailed block for EntireTestsuite K8sDatapathConfig
cmd: kubectl get pods -o wide --all-namespaces
Exitcode: 0 
Stdout:
 	 NAMESPACE           NAME                               READY   STATUS             RESTARTS   AGE     IP              NODE   NOMINATED NODE   READINESS GATES
	 cilium-monitoring   grafana-d69c97b9b-hl4n7            1/1     Running            0          59s     10.0.1.200      k8s2   <none>           <none>
	 cilium-monitoring   prometheus-655fb888d7-6tbb8        1/1     Running            0          59s     10.0.1.228      k8s2   <none>           <none>
	 kube-system         cilium-gct59                       1/1     Running            0          58s     192.168.56.12   k8s2   <none>           <none>
	 kube-system         cilium-operator-7b77c755b5-4q9mr   1/1     Running            1          58s     192.168.56.12   k8s2   <none>           <none>
	 kube-system         cilium-operator-7b77c755b5-rsql7   1/1     Running            0          58s     192.168.56.11   k8s1   <none>           <none>
	 kube-system         cilium-tlk79                       1/1     Running            0          58s     192.168.56.11   k8s1   <none>           <none>
	 kube-system         etcd-k8s1                          1/1     Running            0          4m48s   192.168.56.11   k8s1   <none>           <none>
	 kube-system         kube-apiserver-k8s1                1/1     Running            0          4m48s   192.168.56.11   k8s1   <none>           <none>
	 kube-system         kube-controller-manager-k8s1       0/1     CrashLoopBackOff   1          4m48s   192.168.56.11   k8s1   <none>           <none>
	 kube-system         kube-proxy-j6f5t                   1/1     Running            0          4m36s   192.168.56.11   k8s1   <none>           <none>
	 kube-system         kube-proxy-lvmp5                   1/1     Running            0          99s     192.168.56.12   k8s2   <none>           <none>
	 kube-system         kube-scheduler-k8s1                0/1     Error              1          4m48s   192.168.56.11   k8s1   <none>           <none>
	 kube-system         log-gatherer-bcts5                 1/1     Running            0          62s     192.168.56.12   k8s2   <none>           <none>
	 kube-system         log-gatherer-g2n9r                 1/1     Running            0          62s     192.168.56.11   k8s1   <none>           <none>
	 kube-system         registry-adder-625b6               1/1     Running            0          96s     192.168.56.12   k8s2   <none>           <none>
	 kube-system         registry-adder-97rpn               1/1     Running            0          96s     192.168.56.11   k8s1   <none>           <none>
	 
Stderr:
 	 

Fetching command output from pods [cilium-gct59 cilium-tlk79]
cmd: kubectl exec -n kube-system cilium-gct59 -c cilium-agent -- cilium status
Exitcode: 0 
Stdout:
 	 KVStore:                 Ok   Disabled
	 Kubernetes:              Ok   1.20 (v1.20.15) [linux/amd64]
	 Kubernetes APIs:         ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1beta1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
	 KubeProxyReplacement:    Disabled   
	 Host firewall:           Disabled
	 CNI Chaining:            none
	 Cilium:                  Ok   1.12.12 (v1.12.12-1e57744)
	 NodeMonitor:             Listening for events on 3 CPUs with 64x4096 of shared memory
	 Cilium health daemon:    Ok   
	 IPAM:                    IPv4: 4/254 allocated from 10.0.1.0/24, IPv6: 4/254 allocated from fd02::100/120
	 BandwidthManager:        Disabled
	 Host Routing:            Legacy
	 Masquerading:            IPTables [IPv4: Enabled, IPv6: Enabled]
	 Controller Status:       26/26 healthy
	 Proxy Status:            OK, ip 10.0.1.247, 0 redirects active on ports 10000-20000
	 Global Identity Range:   min 256, max 65535
	 Hubble:                  Ok   Current/Max Flows: 196/65535 (0.30%), Flows/s: 3.83   Metrics: Disabled
	 Encryption:              Disabled
	 Cluster health:          0/2 reachable   (2023-07-26T21:18:28Z)
	   Name                   IP              Node        Endpoints
	   k8s2 (localhost)       192.168.56.12   reachable   unreachable
	   k8s1                   192.168.56.11   reachable   unreachable
	 
Stderr:
 	 

cmd: kubectl exec -n kube-system cilium-gct59 -c cilium-agent -- cilium endpoint list
Exitcode: 0 
Stdout:
 	 ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])                              IPv6        IPv4         STATUS   
	            ENFORCEMENT        ENFORCEMENT                                                                                                    
	 212        Disabled           Disabled          4          reserved:health                                          fd02::13d   10.0.1.50    ready   
	 2080       Disabled           Disabled          1          k8s:cilium.io/ci-node=k8s2                                                        ready   
	                                                            reserved:host                                                                             
	 3360       Disabled           Disabled          48086      k8s:app=prometheus                                       fd02::112   10.0.1.228   ready   
	                                                            k8s:io.cilium.k8s.policy.cluster=default                                                  
	                                                            k8s:io.cilium.k8s.policy.serviceaccount=prometheus-k8s                                    
	                                                            k8s:io.kubernetes.pod.namespace=cilium-monitoring                                         
	 4071       Disabled           Disabled          5194       k8s:app=grafana                                          fd02::154   10.0.1.200   ready   
	                                                            k8s:io.cilium.k8s.policy.cluster=default                                                  
	                                                            k8s:io.cilium.k8s.policy.serviceaccount=default                                           
	                                                            k8s:io.kubernetes.pod.namespace=cilium-monitoring                                         
	 
Stderr:
 	 

cmd: kubectl exec -n kube-system cilium-tlk79 -c cilium-agent -- cilium status
Exitcode: 0 
Stdout:
 	 KVStore:                 Ok   Disabled
	 Kubernetes:              Ok   1.20 (v1.20.15) [linux/amd64]
	 Kubernetes APIs:         ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1beta1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
	 KubeProxyReplacement:    Disabled   
	 Host firewall:           Disabled
	 CNI Chaining:            none
	 Cilium:                  Ok   1.12.12 (v1.12.12-1e57744)
	 NodeMonitor:             Listening for events on 3 CPUs with 64x4096 of shared memory
	 Cilium health daemon:    Ok   
	 IPAM:                    IPv4: 2/254 allocated from 10.0.0.0/24, IPv6: 2/254 allocated from fd02::/120
	 BandwidthManager:        Disabled
	 Host Routing:            Legacy
	 Masquerading:            IPTables [IPv4: Enabled, IPv6: Enabled]
	 Controller Status:       18/18 healthy
	 Proxy Status:            OK, ip 10.0.0.213, 0 redirects active on ports 10000-20000
	 Global Identity Range:   min 256, max 65535
	 Hubble:                  Ok   Current/Max Flows: 115/65535 (0.18%), Flows/s: 2.08   Metrics: Disabled
	 Encryption:              Disabled
	 Cluster health:          0/2 reachable   (2023-07-26T21:18:26Z)
	   Name                   IP              Node        Endpoints
	   k8s1 (localhost)       192.168.56.11   reachable   unreachable
	   k8s2                   192.168.56.12   reachable   unreachable
	 
Stderr:
 	 

cmd: kubectl exec -n kube-system cilium-tlk79 -c cilium-agent -- cilium endpoint list
Exitcode: 0 
Stdout:
 	 ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])                 IPv6       IPv4         STATUS   
	            ENFORCEMENT        ENFORCEMENT                                                                                      
	 290        Disabled           Disabled          4          reserved:health                             fd02::32   10.0.0.164   ready   
	 2446       Disabled           Disabled          1          k8s:cilium.io/ci-node=k8s1                                          ready   
	                                                            k8s:node-role.kubernetes.io/control-plane                                   
	                                                            k8s:node-role.kubernetes.io/master                                          
	                                                            reserved:host                                                               
	 
Stderr:
 	 

===================== Exiting AfterFailed =====================
21:19:43 STEP: Running AfterEach for block EntireTestsuite K8sDatapathConfig
21:19:43 STEP: Running AfterEach for block EntireTestsuite

[[ATTACHMENT|dc0d4932_K8sDatapathConfig_MonitorAggregation_Checks_that_monitor_aggregation_restricts_notifications.zip]]


ZIP Links:

Click to show.

https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.9//113/artifact/cilium-sysdump.zip
https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.9//113/artifact/dc0d4932_K8sDatapathConfig_MonitorAggregation_Checks_that_monitor_aggregation_restricts_notifications.zip
https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.9//113/artifact/test_results_Cilium-PR-K8s-1.20-kernel-4.9_113_BDD-Test-PR.zip

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.9/113/

If this is a duplicate of an existing flake, comment 'Duplicate of #<issue-number>' and close this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ci/flakeThis is a known failure that occurs in the tree. Please investigate me!staleThe stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions