-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Closed as not planned
Closed as not planned
Copy link
Labels
ci/flakeThis is a known failure that occurs in the tree. Please investigate me!This is a known failure that occurs in the tree. Please investigate me!staleThe stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.
Description
Test Name
K8sDatapathConfig MonitorAggregation Checks that monitor aggregation restricts notifications
Failure Output
FAIL: Unable to restart unmanaged pods with 'kubectl -n kube-system delete pods coredns-7c74c644b-rxsz6': Exitcode: 1
Stacktrace
Click to show.
/home/jenkins/workspace/Cilium-PR-K8s-1.20-kernel-4.9/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:527
Unable to restart unmanaged pods with 'kubectl -n kube-system delete pods coredns-7c74c644b-rxsz6': Exitcode: 1
Err: exit status 1
Stdout:
pod "coredns-7c74c644b-rxsz6" deleted
Stderr:
Error from server: etcdserver: request timed out
/home/jenkins/workspace/Cilium-PR-K8s-1.20-kernel-4.9/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:647
Standard Output
Click to show.
Number of "context deadline exceeded" in logs: 0
Number of "level=error" in logs: 0
Number of "level=warning" in logs: 0
Number of "Cilium API handler panicked" in logs: 0
Number of "Goroutine took lock for more than" in logs: 0
No errors/warnings found in logs
Number of "context deadline exceeded" in logs: 2
Number of "level=error" in logs: 2
Number of "level=warning" in logs: 2
Number of "Cilium API handler panicked" in logs: 0
Number of "Goroutine took lock for more than" in logs: 0
Top 3 errors/warnings:
Network status error received, restarting client connections
error retrieving resource lock kube-system/cilium-operator-resource-lock: Get \
Failed to release lock: resource name may not be empty
Number of "context deadline exceeded" in logs: 0
Number of "level=error" in logs: 0
⚠️ Number of "level=warning" in logs: 6
Number of "Cilium API handler panicked" in logs: 0
Number of "Goroutine took lock for more than" in logs: 0
Top 3 errors/warnings:
Unable to get node resource
Waiting for k8s node information
Key allocation attempt failed
Cilium pods: [cilium-gct59 cilium-tlk79]
Netpols loaded:
CiliumNetworkPolicies loaded:
Endpoint Policy Enforcement:
Pod Ingress Egress
grafana-d69c97b9b-hl4n7 false false
prometheus-655fb888d7-6tbb8 false false
Cilium agent 'cilium-gct59': Status: Ok Health: Ok Nodes "" ContainerRuntime: Kubernetes: Ok KVstore: Ok Controllers: Total 26 Failed 0
Cilium agent 'cilium-tlk79': Status: Ok Health: Ok Nodes "" ContainerRuntime: Kubernetes: Ok KVstore: Ok Controllers: Total 18 Failed 0
Standard Error
Click to show.
21:18:04 STEP: Running BeforeAll block for EntireTestsuite
21:18:04 STEP: Starting tests: command line parameters: {Reprovision:false HoldEnvironment:false PassCLIEnvironment:true SSHConfig: ShowCommands:false TestScope: SkipLogGathering:false CiliumImage:quay.io/cilium/cilium-ci CiliumTag:1e57744c70369599108bbe789caae64648de0d0b CiliumOperatorImage:quay.io/cilium/operator CiliumOperatorTag:1e57744c70369599108bbe789caae64648de0d0b CiliumOperatorSuffix:-ci HubbleRelayImage:quay.io/cilium/hubble-relay-ci HubbleRelayTag:1e57744c70369599108bbe789caae64648de0d0b ProvisionK8s:true Timeout:2h50m0s Kubeconfig:/home/jenkins/workspace/Cilium-PR-K8s-1.20-kernel-4.9/src/github.com/cilium/cilium/test/vagrant-kubeconfig KubectlPath:/tmp/kubectl RegistryCredentials: Multinode:true RunQuarantined:false Help:false} environment variables: [JENKINS_HOME=/var/jenkins_home ghprbSourceBranch=meyskens/112-ipcache-meta-delete ghprbTriggerAuthorEmail=maartje@eyskens.me VM_MEMORY=8192 MAIL=/var/mail/root SSH_CLIENT=52.25.14.27 52524 22 ghprbPullAuthorEmail=maartje@eyskens.me USER=root PROJ_PATH=src/github.com/cilium/cilium RUN_CHANGES_DISPLAY_URL=https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.9/113/display/redirect?page=changes ghprbPullDescription=GitHub pull request #26958 of commit 1e57744c70369599108bbe789caae64648de0d0b, no merge conflicts. NETNEXT=0 ghprbActualCommit=1e57744c70369599108bbe789caae64648de0d0b SHLVL=1 CILIUM_TAG=1e57744c70369599108bbe789caae64648de0d0b NODE_LABELS=baremetal ginkgo nightly node-calm-loon vagrant HUDSON_URL=https://jenkins.cilium.io/ GIT_COMMIT=0e23f524e0f906d41ade7c0c72e1e9dc62f25edf OLDPWD=/home/jenkins/workspace/Cilium-PR-K8s-1.20-kernel-4.9 GINKGO_TIMEOUT=170m HOME=/home/jenkins/workspace/Cilium-PR-K8s-1.20-kernel-4.9 ghprbTriggerAuthorLoginMention=@meyskens BUILD_URL=https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.9/113/ ghprbPullAuthorLoginMention=@meyskens HUDSON_COOKIE=54312723-b808-45bc-8b5f-b5d16660203e JENKINS_SERVER_COOKIE=durable-58f206a8c9615f5eb8aa6d55008bb9eb ghprbGhRepository=cilium/cilium DOCKER_TAG=1e57744c70369599108bbe789caae64648de0d0b JobKernelVersion=49 DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/0/bus KERNEL=49 CONTAINER_RUNTIME=docker WORKSPACE=/home/jenkins/workspace/Cilium-PR-K8s-1.20-kernel-4.9 ghprbPullLongDescription=This adds a delete call to the metadata map of the ipcache. On a delete from ipcache this info was left bhind. In case a node gets deleted the IP would keep receiving the remote-host label from this cache incorrectly.\r\n\r\nThis issue came out of an issue where we saw EKS nodes being recycled and the IPs getting re-used and receiving an incorrect remote-host label in identity cache and ip cache.\r\n\r\n```release-note\r\nDelete IP Label metadata on delete from ipcache\r\n```\r\n\r\n- v1.12: this PR\r\n- v1.13: https://github.com/cilium/cilium/pull/27010\r\n- v1.14: not affected thanks to a refactor :) \r\n K8S_NODES=2 TESTDIR=/home/jenkins/workspace/Cilium-PR-K8s-1.20-kernel-4.9/src/github.com/cilium/cilium/test LOGNAME=root NODE_NAME=node-calm-loon ghprbCredentialsId=ciliumbot _=/usr/bin/java HUBBLE_RELAY_IMAGE=quay.io/cilium/hubble-relay-ci STAGE_NAME=BDD-Test-PR GIT_BRANCH=origin/pr/26958/merge EXECUTOR_NUMBER=0 ghprbTriggerAuthorLogin=meyskens TERM=xterm XDG_SESSION_ID=4 HOST_FIREWALL=0 CILIUM_OPERATOR_TAG=1e57744c70369599108bbe789caae64648de0d0b BUILD_DISPLAY_NAME=[v1.12] Delete IP Label metadata on delete from ipcache https://github.com/cilium/cilium/pull/26958 #113 ghprbPullAuthorLogin=meyskens HUDSON_HOME=/var/jenkins_home ghprbTriggerAuthor=Maartje Eyskens JOB_BASE_NAME=Cilium-PR-K8s-1.20-kernel-4.9 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/local/go/bin:/root/go/bin sha1=origin/pr/26958/merge KUBECONFIG=/home/jenkins/workspace/Cilium-PR-K8s-1.20-kernel-4.9/src/github.com/cilium/cilium/test/vagrant-kubeconfig FOCUS=K8s BUILD_ID=113 XDG_RUNTIME_DIR=/run/user/0 BUILD_TAG=jenkins-Cilium-PR-K8s-1.20-kernel-4.9-113 RUN_QUARANTINED=false CILIUM_IMAGE=quay.io/cilium/cilium-ci JENKINS_URL=https://jenkins.cilium.io/ LANG=C.UTF-8 ghprbCommentBody=/test-backport-1.12 JOB_URL=https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.9/ ghprbPullTitle=[v1.12] Delete IP Label metadata on delete from ipcache GIT_URL=https://github.com/cilium/cilium ghprbPullLink=https://github.com/cilium/cilium/pull/26958 BUILD_NUMBER=113 JENKINS_NODE_COOKIE=f8546b80-d9c6-4b1e-8938-230d2e3c2640 SHELL=/bin/bash GOPATH=/home/jenkins/workspace/Cilium-PR-K8s-1.20-kernel-4.9 RUN_DISPLAY_URL=https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.9/113/display/redirect IMAGE_REGISTRY=quay.io/cilium ghprbAuthorRepoGitUrl=https://github.com/meyskens/cilium.git FAILFAST=false HUDSON_SERVER_COOKIE=693c250bfb7e85bf ghprbTargetBranch=v1.12 JOB_DISPLAY_URL=https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.9/display/redirect K8S_VERSION=1.20 JOB_NAME=Cilium-PR-K8s-1.20-kernel-4.9 SSH_CONNECTION=52.25.14.27 52524 139.178.83.129 22 ghprbPullId=26958 CILIUM_OPERATOR_IMAGE=quay.io/cilium/operator HUBBLE_RELAY_TAG=1e57744c70369599108bbe789caae64648de0d0b JobK8sVersion=1.20 VM_CPUS=3 PWD=/home/jenkins/workspace/Cilium-PR-K8s-1.20-kernel-4.9/src/github.com/cilium/cilium/test CILIUM_OPERATOR_SUFFIX=-ci]
21:18:04 STEP: Ensuring the namespace kube-system exists
21:18:04 STEP: WaitforPods(namespace="kube-system", filter="-l k8s-app=cilium-test-logs")
21:18:06 STEP: WaitforPods(namespace="kube-system", filter="-l k8s-app=cilium-test-logs") => <nil>
21:18:06 STEP: Preparing cluster
21:18:07 STEP: Labelling nodes
21:18:07 STEP: Cleaning up Cilium components
21:18:07 STEP: Running BeforeAll block for EntireTestsuite K8sDatapathConfig
21:18:07 STEP: Ensuring the namespace kube-system exists
21:18:07 STEP: WaitforPods(namespace="kube-system", filter="-l k8s-app=cilium-test-logs")
21:18:07 STEP: WaitforPods(namespace="kube-system", filter="-l k8s-app=cilium-test-logs") => <nil>
21:18:07 STEP: Installing Cilium
21:18:08 STEP: Waiting for Cilium to become ready
21:18:47 STEP: Restarting unmanaged pods coredns-7c74c644b-rxsz6 in namespace kube-system
FAIL: Unable to restart unmanaged pods with 'kubectl -n kube-system delete pods coredns-7c74c644b-rxsz6': Exitcode: 1
Err: exit status 1
Stdout:
pod "coredns-7c74c644b-rxsz6" deleted
Stderr:
Error from server: etcdserver: request timed out
=== Test Finished at 2023-07-26T21:18:54Z====
21:18:54 STEP: Running JustAfterEach block for EntireTestsuite K8sDatapathConfig
===================== TEST FAILED =====================
21:19:01 STEP: Running AfterFailed block for EntireTestsuite K8sDatapathConfig
cmd: kubectl get pods -o wide --all-namespaces
Exitcode: 0
Stdout:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cilium-monitoring grafana-d69c97b9b-hl4n7 1/1 Running 0 59s 10.0.1.200 k8s2 <none> <none>
cilium-monitoring prometheus-655fb888d7-6tbb8 1/1 Running 0 59s 10.0.1.228 k8s2 <none> <none>
kube-system cilium-gct59 1/1 Running 0 58s 192.168.56.12 k8s2 <none> <none>
kube-system cilium-operator-7b77c755b5-4q9mr 1/1 Running 1 58s 192.168.56.12 k8s2 <none> <none>
kube-system cilium-operator-7b77c755b5-rsql7 1/1 Running 0 58s 192.168.56.11 k8s1 <none> <none>
kube-system cilium-tlk79 1/1 Running 0 58s 192.168.56.11 k8s1 <none> <none>
kube-system etcd-k8s1 1/1 Running 0 4m48s 192.168.56.11 k8s1 <none> <none>
kube-system kube-apiserver-k8s1 1/1 Running 0 4m48s 192.168.56.11 k8s1 <none> <none>
kube-system kube-controller-manager-k8s1 0/1 CrashLoopBackOff 1 4m48s 192.168.56.11 k8s1 <none> <none>
kube-system kube-proxy-j6f5t 1/1 Running 0 4m36s 192.168.56.11 k8s1 <none> <none>
kube-system kube-proxy-lvmp5 1/1 Running 0 99s 192.168.56.12 k8s2 <none> <none>
kube-system kube-scheduler-k8s1 0/1 Error 1 4m48s 192.168.56.11 k8s1 <none> <none>
kube-system log-gatherer-bcts5 1/1 Running 0 62s 192.168.56.12 k8s2 <none> <none>
kube-system log-gatherer-g2n9r 1/1 Running 0 62s 192.168.56.11 k8s1 <none> <none>
kube-system registry-adder-625b6 1/1 Running 0 96s 192.168.56.12 k8s2 <none> <none>
kube-system registry-adder-97rpn 1/1 Running 0 96s 192.168.56.11 k8s1 <none> <none>
Stderr:
Fetching command output from pods [cilium-gct59 cilium-tlk79]
cmd: kubectl exec -n kube-system cilium-gct59 -c cilium-agent -- cilium status
Exitcode: 0
Stdout:
KVStore: Ok Disabled
Kubernetes: Ok 1.20 (v1.20.15) [linux/amd64]
Kubernetes APIs: ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1beta1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement: Disabled
Host firewall: Disabled
CNI Chaining: none
Cilium: Ok 1.12.12 (v1.12.12-1e57744)
NodeMonitor: Listening for events on 3 CPUs with 64x4096 of shared memory
Cilium health daemon: Ok
IPAM: IPv4: 4/254 allocated from 10.0.1.0/24, IPv6: 4/254 allocated from fd02::100/120
BandwidthManager: Disabled
Host Routing: Legacy
Masquerading: IPTables [IPv4: Enabled, IPv6: Enabled]
Controller Status: 26/26 healthy
Proxy Status: OK, ip 10.0.1.247, 0 redirects active on ports 10000-20000
Global Identity Range: min 256, max 65535
Hubble: Ok Current/Max Flows: 196/65535 (0.30%), Flows/s: 3.83 Metrics: Disabled
Encryption: Disabled
Cluster health: 0/2 reachable (2023-07-26T21:18:28Z)
Name IP Node Endpoints
k8s2 (localhost) 192.168.56.12 reachable unreachable
k8s1 192.168.56.11 reachable unreachable
Stderr:
cmd: kubectl exec -n kube-system cilium-gct59 -c cilium-agent -- cilium endpoint list
Exitcode: 0
Stdout:
ENDPOINT POLICY (ingress) POLICY (egress) IDENTITY LABELS (source:key[=value]) IPv6 IPv4 STATUS
ENFORCEMENT ENFORCEMENT
212 Disabled Disabled 4 reserved:health fd02::13d 10.0.1.50 ready
2080 Disabled Disabled 1 k8s:cilium.io/ci-node=k8s2 ready
reserved:host
3360 Disabled Disabled 48086 k8s:app=prometheus fd02::112 10.0.1.228 ready
k8s:io.cilium.k8s.policy.cluster=default
k8s:io.cilium.k8s.policy.serviceaccount=prometheus-k8s
k8s:io.kubernetes.pod.namespace=cilium-monitoring
4071 Disabled Disabled 5194 k8s:app=grafana fd02::154 10.0.1.200 ready
k8s:io.cilium.k8s.policy.cluster=default
k8s:io.cilium.k8s.policy.serviceaccount=default
k8s:io.kubernetes.pod.namespace=cilium-monitoring
Stderr:
cmd: kubectl exec -n kube-system cilium-tlk79 -c cilium-agent -- cilium status
Exitcode: 0
Stdout:
KVStore: Ok Disabled
Kubernetes: Ok 1.20 (v1.20.15) [linux/amd64]
Kubernetes APIs: ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1beta1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement: Disabled
Host firewall: Disabled
CNI Chaining: none
Cilium: Ok 1.12.12 (v1.12.12-1e57744)
NodeMonitor: Listening for events on 3 CPUs with 64x4096 of shared memory
Cilium health daemon: Ok
IPAM: IPv4: 2/254 allocated from 10.0.0.0/24, IPv6: 2/254 allocated from fd02::/120
BandwidthManager: Disabled
Host Routing: Legacy
Masquerading: IPTables [IPv4: Enabled, IPv6: Enabled]
Controller Status: 18/18 healthy
Proxy Status: OK, ip 10.0.0.213, 0 redirects active on ports 10000-20000
Global Identity Range: min 256, max 65535
Hubble: Ok Current/Max Flows: 115/65535 (0.18%), Flows/s: 2.08 Metrics: Disabled
Encryption: Disabled
Cluster health: 0/2 reachable (2023-07-26T21:18:26Z)
Name IP Node Endpoints
k8s1 (localhost) 192.168.56.11 reachable unreachable
k8s2 192.168.56.12 reachable unreachable
Stderr:
cmd: kubectl exec -n kube-system cilium-tlk79 -c cilium-agent -- cilium endpoint list
Exitcode: 0
Stdout:
ENDPOINT POLICY (ingress) POLICY (egress) IDENTITY LABELS (source:key[=value]) IPv6 IPv4 STATUS
ENFORCEMENT ENFORCEMENT
290 Disabled Disabled 4 reserved:health fd02::32 10.0.0.164 ready
2446 Disabled Disabled 1 k8s:cilium.io/ci-node=k8s1 ready
k8s:node-role.kubernetes.io/control-plane
k8s:node-role.kubernetes.io/master
reserved:host
Stderr:
===================== Exiting AfterFailed =====================
21:19:43 STEP: Running AfterEach for block EntireTestsuite K8sDatapathConfig
21:19:43 STEP: Running AfterEach for block EntireTestsuite
[[ATTACHMENT|dc0d4932_K8sDatapathConfig_MonitorAggregation_Checks_that_monitor_aggregation_restricts_notifications.zip]]
ZIP Links:
Click to show.
https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.9//113/artifact/cilium-sysdump.zip
https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.9//113/artifact/dc0d4932_K8sDatapathConfig_MonitorAggregation_Checks_that_monitor_aggregation_restricts_notifications.zip
https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.9//113/artifact/test_results_Cilium-PR-K8s-1.20-kernel-4.9_113_BDD-Test-PR.zip
Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.9/113/
If this is a duplicate of an existing flake, comment 'Duplicate of #<issue-number>' and close this issue.
Metadata
Metadata
Assignees
Labels
ci/flakeThis is a known failure that occurs in the tree. Please investigate me!This is a known failure that occurs in the tree. Please investigate me!staleThe stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.