-
Notifications
You must be signed in to change notification settings - Fork 3.4k
ci: update AWS VPC CNI plugin for AWS-CNI workflow in v1.11/v1.12 branches #20318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…nches While updating eksctl to v0.101.0 in master/v1.11 as part of commit 396eb1b ("ci: update eksctl to v0.101.0 and AWS VPC CNI plugin to v1.11"), the AWS VPC CNI was updated to 1.11 on master, but not on the v1.11 branch. Do that now to fix the following error on that workflow: error: unable to recognize "https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/v1.7.10/config/v1.7/aws-k8s-cni.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1beta1" Since v1.12 was branched of in the meantime, also using the outdated AWS VPC CNI version bump it in the corresponding workflow as well. Fixes: 396eb1b ("ci: update eksctl to v0.101.0 and AWS VPC CNI plugin to v1.11") Reported-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Tobias Klauser <tobias@cilium.io>
72d9fc2
to
9399326
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why I thought that was on purpose 🤦
@tklauser I think we need to update v1.10 too: https://github.com/cilium/cilium/runs/7089474018?check_suite_focus=true#step:10:24 |
not sure how I missed that - again 🤦 will send a follow-up PR. EDIT: addressed in #20329 |
As the previous commit mentions, this does not occur on main, v1.16, and v1.15 due to cilium#29036. However, in v1.14 and v1.13, we need to take special care because the NameManager and SelectorCache lock can be taken while the Endpoint lock is held during Endpoint deletion. Here are the relevant stacktraces regarding the deadlock: ``` 1: sync.Mutex.Lock [75 minutes] [Created by http.(*Server).Serve in goroutine 1699 @ server.go:3086] sync sema.go:77 runtime_SemacquireMutex(*uint32(0x5), false, 43690) sync mutex.go:171 (*Mutex).lockSlow(*Mutex(cilium#1733)) sync mutex.go:90 (*Mutex).Lock(...) sync rwmutex.go:147 (*RWMutex).Lock(*RWMutex(0xb0)) fqdn name_manager.go:70 (*NameManager).Lock(0xffffffffffffffff) policy selectorcache.go:964 (*SelectorCache).RemoveSelectors(cilium#1088, {cilium#28569, 0xb, 1}, {cilium#643, cilium#32582}) policy l4.go:810 (*L4Filter).removeSelectors(cilium#32582, cilium#29992) policy l4.go:817 (*L4Filter).detach(cilium#719, cilium#29993) policy l4.go:988 L4PolicyMap.Detach(...) policy l4.go:1179 (*L4Policy).Detach(cilium#20318, cilium#1383) policy resolve.go:103 (*selectorPolicy).Detach(...) policy distillery.go:81 (*PolicyCache).delete(cilium#1354, cilium#19354) policy distillery.go:138 (*PolicyCache).LocalEndpointIdentityRemoved(cilium#523, cilium#1163) identitymanager manager.go:167 (*IdentityManager).remove(cilium#706, cilium#19354) identitymanager manager.go:147 (*IdentityManager).Remove(cilium#706, cilium#19354) identitymanager manager.go:52 Remove(...) endpoint endpoint.go:1146 (*Endpoint).leaveLocked(cilium#1883, cilium#12221, {0x30, 0}) endpoint endpoint.go:2192 (*Endpoint).Delete(cilium#1883, {0x80, 0xaa}) endpointmanager manager.go:380 (*EndpointManager).removeEndpoint(cilium#1161, 0, {0xff, 0xff}) endpointmanager manager.go:394 (*EndpointManager).RemoveEndpoint(...) cmd endpoint.go:684 (*Daemon).deleteEndpointQuiet(...) cmd endpoint.go:666 (*Daemon).deleteEndpoint(cilium#1155, cilium#1883) cmd endpoint.go:713 (*Daemon).DeleteEndpoint(cilium#1155, {cilium#27399, cilium#8108}) cmd endpoint.go:770 (*deleteEndpointID).Handle(cilium#21193, {cilium#2452, {cilium#27399, 0x4d}}) endpoint delete_endpoint_id.go:66 (*DeleteEndpointID).ServeHTTP(cilium#5934, {cilium#666, cilium#2242}, cilium#2452) middleware operation.go:28 (*Context).RoutesHandler.NewOperationExecutor.func1({cilium#666, cilium#2242}, cilium#2452) http server.go:2136 HandlerFunc.ServeHTTP(ReadCloser(cilium#130), func{cilium#2242, 0x3}) middleware router.go:78 NewRouter.func1({cilium#666, cilium#2242}, cilium#2451) http server.go:2136 HandlerFunc.ServeHTTP(ReadCloser(cilium#718), func{cilium#2242, #59}) middleware redoc.go:72 Redoc.func1({cilium#666, cilium#2242}, cilium#1251) http server.go:2136 HandlerFunc.ServeHTTP(ReadCloser(cilium#4920), func{cilium#2242, #45}) middleware spec.go:46 Spec.func1({cilium#666, cilium#2242}, cilium#4921) http server.go:2136 HandlerFunc.ServeHTTP(ReadCloser(cilium#10532), func{cilium#2242, cilium#23015}) metrics middleware.go:64 (*APIEventTSHelper).ServeHTTP(cilium#1459, {cilium#668, cilium#10533}, cilium#2451) api apipanic.go:42 (*APIPanicHandler).ServeHTTP(cilium#722, {cilium#668, cilium#10533}, cilium#4922) http server.go:2938 serverHandler.ServeHTTP(*Server(cilium#8105), cilium#668, cilium#10533, 0x6) http server.go:2009 (*conn).serve(*conn(cilium#16005), Context{cilium#673, cilium#1554}) 8: sync.Mutex.Lock [74 minutes] [Created by http.(*Server).Serve in goroutine 1699 @ server.go:3086] sync sema.go:77 runtime_SemacquireMutex(*, 0x47, cilium#1154) sync mutex.go:171 (*Mutex).lockSlow(cilium#706) sync mutex.go:90 (*Mutex).Lock(...) sync rwmutex.go:147 (*RWMutex).Lock(*) identitymanager manager.go:99 (*IdentityManager).RemoveOldAddNew(cilium#706, 0, cilium#1154) identitymanager manager.go:123 RemoveOldAddNew(...) endpoint policy.go:852 (*Endpoint).SetIdentity(*, cilium#1154, 0) endpoint endpoint.go:1932 (*Endpoint).identityLabelsChanged(*, {cilium#674, *}, 1) endpoint endpoint.go:1780 (*Endpoint).runIdentityResolver(*, {cilium#674, *}, 1, 1) endpoint endpoint.go:1720 (*Endpoint).UpdateLabels(*, {cilium#674, *}, *, *, 8) cmd endpoint.go:477 (*Daemon).createEndpoint(cilium#1155, {cilium#673, *}, {cilium#683, cilium#1155}, *) cmd endpoint.go:542 (*putEndpointID).Handle(cilium#21192, {*, *, {*, 0xe}}) endpoint put_endpoint_id.go:58 (*PutEndpointID).ServeHTTP(cilium#3961, {cilium#666, *}, *) middleware operation.go:28 (*Context).RoutesHandler.NewOperationExecutor.func1({cilium#666, *}, *) http server.go:2136 HandlerFunc.ServeHTTP(*, {cilium#666, *}, *) middleware router.go:78 NewRouter.func1({cilium#666, *}, *) http server.go:2136 HandlerFunc.ServeHTTP(*, {cilium#666, *}, *) middleware redoc.go:72 Redoc.func1({cilium#666, *}, cilium#1251) http server.go:2136 HandlerFunc.ServeHTTP(*, {cilium#666, *}, #45) middleware spec.go:46 Spec.func1({cilium#666, *}, *) http server.go:2136 HandlerFunc.ServeHTTP(*, {cilium#666, *}, *) metrics middleware.go:64 (*APIEventTSHelper).ServeHTTP(cilium#1459, {cilium#668, *}, *) api apipanic.go:42 (*APIPanicHandler).ServeHTTP(#49, {cilium#668, *}, *) http server.go:2938 serverHandler.ServeHTTP({cilium#653}, {cilium#668, *}, 6) http server.go:2009 (*conn).serve(*, {cilium#673, cilium#1554}) 5: sync.Mutex.Lock [75 minutes] [Created by eventqueue.(*EventQueue).Run in goroutine 1482 @ eventqueue.go:229] sync sema.go:77 runtime_SemacquireMutex(cilium#142, 0xe8, *) sync mutex.go:171 (*Mutex).lockSlow(cilium#1733) sync mutex.go:90 (*Mutex).Lock(...) sync rwmutex.go:147 (*RWMutex).Lock(0x68) fqdn name_manager.go:70 (*NameManager).Lock(*) policy selectorcache.go:798 (*SelectorCache).AddFQDNSelector(cilium#1088, {cilium#643, *}, {{*, 0x4d}, {0, 0}}) policy l4.go:628 (*L4Filter).cacheFQDNSelector(...) policy l4.go:623 (*L4Filter).cacheFQDNSelectors(*, {*, 4, cilium#193}, cilium#536) policy l4.go:725 createL4Filter({cilium#680, *}, {*, 1, 1}, 0, {cilium#660, *}, {{*, 4}, ...}, ...) policy l4.go:879 createL4EgressFilter(...) policy rule.go:717 mergeEgressPortProto({cilium#680, *}, #44, {*, 0xa, 0}, *, {cilium#660, *}, {{*, ...}, ...}, ...) policy rule.go:672 mergeEgress.func1({cilium#660, *}) api l4.go:284 PortRules.Iterate({*, 1, cilium#546}, *) policy rule.go:624 mergeEgress({cilium#680, *}, *, {*, 1, 1}, 0, {cilium#661, *}, {cilium#662, ...}, ...) policy rule.go:753 (*rule).resolveEgressPolicy(*, {cilium#680, *}, *, *, *, {0, 0, 0}, {0, ...}) policy rules.go:103 ruleSlice.resolveL4EgressPolicy({*, *, *}, {cilium#680, *}, *) policy repository.go:718 (*Repository).resolvePolicyLocked(cilium#1089, *) policy distillery.go:119 (*PolicyCache).updateSelectorPolicy(cilium#1354, *) policy distillery.go:153 (*PolicyCache).UpdatePolicy(...) endpoint policy.go:262 (*Endpoint).regeneratePolicy(*) endpoint bpf.go:744 (*Endpoint).runPreCompilationSteps(*, *, *) endpoint bpf.go:589 (*Endpoint).regenerateBPF(*, *) endpoint policy.go:457 (*Endpoint).regenerate(*, *) endpoint events.go:53 (*EndpointRegenerationEvent).Handle(*, *) eventqueue eventqueue.go:245 (*EventQueue).run.func1() sync once.go:74 (*Once).doSlow(*, *) sync once.go:65 (*Once).Do(...) eventqueue eventqueue.go:233 (*EventQueue).run(*) 1: select [75 minutes] [Created by eventqueue.(*EventQueue).Run in goroutine 1482 @ eventqueue.go:229] semaphore semaphore.go:60 (*Weighted).Acquire(cilium#1092, {cilium#671, cilium#722}, cilium#766) lock semaphored_mutex.go:30 (*SemaphoredMutex).Lock(...) ipcache ipcache.go:140 (*IPCache).Lock(...) ipcache cidr.go:56 (*IPCache).AllocateCIDRs(cilium#1316, {0, 0, cilium#2037}, {0, 0, 0}, 0) ipcache cidr.go:103 (*IPCache).AllocateCIDRsForIPs(0, {0, cilium#697, 0}, 0xffffffffffffffff) cmd identity.go:114 cachingIdentityAllocator.AllocateCIDRsForIPs(...) policy selectorcache.go:509 (*SelectorCache).allocateIdentityMappings(cilium#1088, {{0, 0}, {cilium#5036, 0x2b}}, #45) policy selectorcache.go:843 (*SelectorCache).AddFQDNSelector(cilium#1088, {cilium#643, cilium#27811}, {{0, 0}, {cilium#5036, 0x2b}}) policy l4.go:628 (*L4Filter).cacheFQDNSelector(...) policy l4.go:623 (*L4Filter).cacheFQDNSelectors(cilium#27811, {#42936, 0x287, cilium#193}, cilium#536) policy l4.go:725 createL4Filter({cilium#680, cilium#22826}, {cilium#22828, 1, 1}, 0, {cilium#660, cilium#24510}, {{cilium#4487, 3}, ...}, ...) policy l4.go:879 createL4EgressFilter(...) policy rule.go:717 mergeEgressPortProto({cilium#680, cilium#22826}, #44, {cilium#22828, 0xa, #78536}, #79633, {cilium#660, cilium#24510}, {{cilium#4487, ...}, ...}, ...) policy rule.go:672 mergeEgress.func1({cilium#660, cilium#24510}) api l4.go:284 PortRules.Iterate({cilium#24510, 1, cilium#546}, cilium#11741) policy rule.go:624 mergeEgress({cilium#680, cilium#22826}, cilium#18687, {cilium#22828, 1, 1}, 0, {cilium#661, cilium#5624}, {cilium#662, ...}, ...) policy rule.go:753 (*rule).resolveEgressPolicy(cilium#24575, {cilium#680, cilium#22826}, cilium#18687, cilium#29345, cilium#4782, {0, 0, 0}, {0, ...}) policy rules.go:103 ruleSlice.resolveL4EgressPolicy({cilium#10690, 0xb, 5}, {cilium#680, cilium#22826}, cilium#18687) policy repository.go:718 (*Repository).resolvePolicyLocked(cilium#1089, cilium#18461) policy distillery.go:119 (*PolicyCache).updateSelectorPolicy(cilium#1354, cilium#18461) policy distillery.go:153 (*PolicyCache).UpdatePolicy(...) endpoint policy.go:262 (*Endpoint).regeneratePolicy(cilium#1748) endpoint bpf.go:744 (*Endpoint).runPreCompilationSteps(cilium#1748, cilium#27542, cilium#4781) endpoint bpf.go:589 (*Endpoint).regenerateBPF(cilium#1748, cilium#27542) endpoint policy.go:457 (*Endpoint).regenerate(cilium#1748, cilium#27542) endpoint events.go:53 (*EndpointRegenerationEvent).Handle(cilium#18609, cilium#703) eventqueue eventqueue.go:245 (*EventQueue).run.func1() sync once.go:74 (*Once).doSlow(*Once(#45414), func(cilium#69)) sync once.go:65 (*Once).Do(...) eventqueue eventqueue.go:233 (*EventQueue).run(cilium#6023) ``` Generated from pp tool: https://github.com/maruel/panicparse Signed-off-by: Chris Tarazi <chris@isovalent.com>
As the previous commit mentions, this does not occur on main, v1.16, and v1.15 due to cilium#29036. However, in v1.14 and v1.13, we need to take special care because the NameManager and SelectorCache lock can be taken while the Endpoint lock is held during Endpoint deletion. Here are the relevant stacktraces regarding the deadlock: ``` 1: sync.Mutex.Lock [75 minutes] [Created by http.(*Server).Serve in goroutine 1699 @ server.go:3086] sync sema.go:77 runtime_SemacquireMutex(*uint32(0x5), false, 43690) sync mutex.go:171 (*Mutex).lockSlow(*Mutex(cilium#1733)) sync mutex.go:90 (*Mutex).Lock(...) sync rwmutex.go:147 (*RWMutex).Lock(*RWMutex(0xb0)) fqdn name_manager.go:70 (*NameManager).Lock(0xffffffffffffffff) policy selectorcache.go:964 (*SelectorCache).RemoveSelectors(cilium#1088, {cilium#28569, 0xb, 1}, {cilium#643, cilium#32582}) policy l4.go:810 (*L4Filter).removeSelectors(cilium#32582, cilium#29992) policy l4.go:817 (*L4Filter).detach(cilium#719, cilium#29993) policy l4.go:988 L4PolicyMap.Detach(...) policy l4.go:1179 (*L4Policy).Detach(cilium#20318, cilium#1383) policy resolve.go:103 (*selectorPolicy).Detach(...) policy distillery.go:81 (*PolicyCache).delete(cilium#1354, cilium#19354) policy distillery.go:138 (*PolicyCache).LocalEndpointIdentityRemoved(cilium#523, cilium#1163) identitymanager manager.go:167 (*IdentityManager).remove(cilium#706, cilium#19354) identitymanager manager.go:147 (*IdentityManager).Remove(cilium#706, cilium#19354) identitymanager manager.go:52 Remove(...) endpoint endpoint.go:1146 (*Endpoint).leaveLocked(cilium#1883, cilium#12221, {0x30, 0}) endpoint endpoint.go:2192 (*Endpoint).Delete(cilium#1883, {0x80, 0xaa}) endpointmanager manager.go:380 (*EndpointManager).removeEndpoint(cilium#1161, 0, {0xff, 0xff}) endpointmanager manager.go:394 (*EndpointManager).RemoveEndpoint(...) cmd endpoint.go:684 (*Daemon).deleteEndpointQuiet(...) cmd endpoint.go:666 (*Daemon).deleteEndpoint(cilium#1155, cilium#1883) cmd endpoint.go:713 (*Daemon).DeleteEndpoint(cilium#1155, {cilium#27399, cilium#8108}) cmd endpoint.go:770 (*deleteEndpointID).Handle(cilium#21193, {cilium#2452, {cilium#27399, 0x4d}}) endpoint delete_endpoint_id.go:66 (*DeleteEndpointID).ServeHTTP(cilium#5934, {cilium#666, cilium#2242}, cilium#2452) middleware operation.go:28 (*Context).RoutesHandler.NewOperationExecutor.func1({cilium#666, cilium#2242}, cilium#2452) http server.go:2136 HandlerFunc.ServeHTTP(ReadCloser(cilium#130), func{cilium#2242, 0x3}) middleware router.go:78 NewRouter.func1({cilium#666, cilium#2242}, cilium#2451) http server.go:2136 HandlerFunc.ServeHTTP(ReadCloser(cilium#718), func{cilium#2242, #59}) middleware redoc.go:72 Redoc.func1({cilium#666, cilium#2242}, cilium#1251) http server.go:2136 HandlerFunc.ServeHTTP(ReadCloser(cilium#4920), func{cilium#2242, #45}) middleware spec.go:46 Spec.func1({cilium#666, cilium#2242}, cilium#4921) http server.go:2136 HandlerFunc.ServeHTTP(ReadCloser(cilium#10532), func{cilium#2242, cilium#23015}) metrics middleware.go:64 (*APIEventTSHelper).ServeHTTP(cilium#1459, {cilium#668, cilium#10533}, cilium#2451) api apipanic.go:42 (*APIPanicHandler).ServeHTTP(cilium#722, {cilium#668, cilium#10533}, cilium#4922) http server.go:2938 serverHandler.ServeHTTP(*Server(cilium#8105), cilium#668, cilium#10533, 0x6) http server.go:2009 (*conn).serve(*conn(cilium#16005), Context{cilium#673, cilium#1554}) 8: sync.Mutex.Lock [74 minutes] [Created by http.(*Server).Serve in goroutine 1699 @ server.go:3086] sync sema.go:77 runtime_SemacquireMutex(*, 0x47, cilium#1154) sync mutex.go:171 (*Mutex).lockSlow(cilium#706) sync mutex.go:90 (*Mutex).Lock(...) sync rwmutex.go:147 (*RWMutex).Lock(*) identitymanager manager.go:99 (*IdentityManager).RemoveOldAddNew(cilium#706, 0, cilium#1154) identitymanager manager.go:123 RemoveOldAddNew(...) endpoint policy.go:852 (*Endpoint).SetIdentity(*, cilium#1154, 0) endpoint endpoint.go:1932 (*Endpoint).identityLabelsChanged(*, {cilium#674, *}, 1) endpoint endpoint.go:1780 (*Endpoint).runIdentityResolver(*, {cilium#674, *}, 1, 1) endpoint endpoint.go:1720 (*Endpoint).UpdateLabels(*, {cilium#674, *}, *, *, 8) cmd endpoint.go:477 (*Daemon).createEndpoint(cilium#1155, {cilium#673, *}, {cilium#683, cilium#1155}, *) cmd endpoint.go:542 (*putEndpointID).Handle(cilium#21192, {*, *, {*, 0xe}}) endpoint put_endpoint_id.go:58 (*PutEndpointID).ServeHTTP(cilium#3961, {cilium#666, *}, *) middleware operation.go:28 (*Context).RoutesHandler.NewOperationExecutor.func1({cilium#666, *}, *) http server.go:2136 HandlerFunc.ServeHTTP(*, {cilium#666, *}, *) middleware router.go:78 NewRouter.func1({cilium#666, *}, *) http server.go:2136 HandlerFunc.ServeHTTP(*, {cilium#666, *}, *) middleware redoc.go:72 Redoc.func1({cilium#666, *}, cilium#1251) http server.go:2136 HandlerFunc.ServeHTTP(*, {cilium#666, *}, #45) middleware spec.go:46 Spec.func1({cilium#666, *}, *) http server.go:2136 HandlerFunc.ServeHTTP(*, {cilium#666, *}, *) metrics middleware.go:64 (*APIEventTSHelper).ServeHTTP(cilium#1459, {cilium#668, *}, *) api apipanic.go:42 (*APIPanicHandler).ServeHTTP(#49, {cilium#668, *}, *) http server.go:2938 serverHandler.ServeHTTP({cilium#653}, {cilium#668, *}, 6) http server.go:2009 (*conn).serve(*, {cilium#673, cilium#1554}) 5: sync.Mutex.Lock [75 minutes] [Created by eventqueue.(*EventQueue).Run in goroutine 1482 @ eventqueue.go:229] sync sema.go:77 runtime_SemacquireMutex(cilium#142, 0xe8, *) sync mutex.go:171 (*Mutex).lockSlow(cilium#1733) sync mutex.go:90 (*Mutex).Lock(...) sync rwmutex.go:147 (*RWMutex).Lock(0x68) fqdn name_manager.go:70 (*NameManager).Lock(*) policy selectorcache.go:798 (*SelectorCache).AddFQDNSelector(cilium#1088, {cilium#643, *}, {{*, 0x4d}, {0, 0}}) policy l4.go:628 (*L4Filter).cacheFQDNSelector(...) policy l4.go:623 (*L4Filter).cacheFQDNSelectors(*, {*, 4, cilium#193}, cilium#536) policy l4.go:725 createL4Filter({cilium#680, *}, {*, 1, 1}, 0, {cilium#660, *}, {{*, 4}, ...}, ...) policy l4.go:879 createL4EgressFilter(...) policy rule.go:717 mergeEgressPortProto({cilium#680, *}, #44, {*, 0xa, 0}, *, {cilium#660, *}, {{*, ...}, ...}, ...) policy rule.go:672 mergeEgress.func1({cilium#660, *}) api l4.go:284 PortRules.Iterate({*, 1, cilium#546}, *) policy rule.go:624 mergeEgress({cilium#680, *}, *, {*, 1, 1}, 0, {cilium#661, *}, {cilium#662, ...}, ...) policy rule.go:753 (*rule).resolveEgressPolicy(*, {cilium#680, *}, *, *, *, {0, 0, 0}, {0, ...}) policy rules.go:103 ruleSlice.resolveL4EgressPolicy({*, *, *}, {cilium#680, *}, *) policy repository.go:718 (*Repository).resolvePolicyLocked(cilium#1089, *) policy distillery.go:119 (*PolicyCache).updateSelectorPolicy(cilium#1354, *) policy distillery.go:153 (*PolicyCache).UpdatePolicy(...) endpoint policy.go:262 (*Endpoint).regeneratePolicy(*) endpoint bpf.go:744 (*Endpoint).runPreCompilationSteps(*, *, *) endpoint bpf.go:589 (*Endpoint).regenerateBPF(*, *) endpoint policy.go:457 (*Endpoint).regenerate(*, *) endpoint events.go:53 (*EndpointRegenerationEvent).Handle(*, *) eventqueue eventqueue.go:245 (*EventQueue).run.func1() sync once.go:74 (*Once).doSlow(*, *) sync once.go:65 (*Once).Do(...) eventqueue eventqueue.go:233 (*EventQueue).run(*) 1: select [75 minutes] [Created by eventqueue.(*EventQueue).Run in goroutine 1482 @ eventqueue.go:229] semaphore semaphore.go:60 (*Weighted).Acquire(cilium#1092, {cilium#671, cilium#722}, cilium#766) lock semaphored_mutex.go:30 (*SemaphoredMutex).Lock(...) ipcache ipcache.go:140 (*IPCache).Lock(...) ipcache cidr.go:56 (*IPCache).AllocateCIDRs(cilium#1316, {0, 0, cilium#2037}, {0, 0, 0}, 0) ipcache cidr.go:103 (*IPCache).AllocateCIDRsForIPs(0, {0, cilium#697, 0}, 0xffffffffffffffff) cmd identity.go:114 cachingIdentityAllocator.AllocateCIDRsForIPs(...) policy selectorcache.go:509 (*SelectorCache).allocateIdentityMappings(cilium#1088, {{0, 0}, {cilium#5036, 0x2b}}, #45) policy selectorcache.go:843 (*SelectorCache).AddFQDNSelector(cilium#1088, {cilium#643, cilium#27811}, {{0, 0}, {cilium#5036, 0x2b}}) policy l4.go:628 (*L4Filter).cacheFQDNSelector(...) policy l4.go:623 (*L4Filter).cacheFQDNSelectors(cilium#27811, {#42936, 0x287, cilium#193}, cilium#536) policy l4.go:725 createL4Filter({cilium#680, cilium#22826}, {cilium#22828, 1, 1}, 0, {cilium#660, cilium#24510}, {{cilium#4487, 3}, ...}, ...) policy l4.go:879 createL4EgressFilter(...) policy rule.go:717 mergeEgressPortProto({cilium#680, cilium#22826}, #44, {cilium#22828, 0xa, #78536}, #79633, {cilium#660, cilium#24510}, {{cilium#4487, ...}, ...}, ...) policy rule.go:672 mergeEgress.func1({cilium#660, cilium#24510}) api l4.go:284 PortRules.Iterate({cilium#24510, 1, cilium#546}, cilium#11741) policy rule.go:624 mergeEgress({cilium#680, cilium#22826}, cilium#18687, {cilium#22828, 1, 1}, 0, {cilium#661, cilium#5624}, {cilium#662, ...}, ...) policy rule.go:753 (*rule).resolveEgressPolicy(cilium#24575, {cilium#680, cilium#22826}, cilium#18687, cilium#29345, cilium#4782, {0, 0, 0}, {0, ...}) policy rules.go:103 ruleSlice.resolveL4EgressPolicy({cilium#10690, 0xb, 5}, {cilium#680, cilium#22826}, cilium#18687) policy repository.go:718 (*Repository).resolvePolicyLocked(cilium#1089, cilium#18461) policy distillery.go:119 (*PolicyCache).updateSelectorPolicy(cilium#1354, cilium#18461) policy distillery.go:153 (*PolicyCache).UpdatePolicy(...) endpoint policy.go:262 (*Endpoint).regeneratePolicy(cilium#1748) endpoint bpf.go:744 (*Endpoint).runPreCompilationSteps(cilium#1748, cilium#27542, cilium#4781) endpoint bpf.go:589 (*Endpoint).regenerateBPF(cilium#1748, cilium#27542) endpoint policy.go:457 (*Endpoint).regenerate(cilium#1748, cilium#27542) endpoint events.go:53 (*EndpointRegenerationEvent).Handle(cilium#18609, cilium#703) eventqueue eventqueue.go:245 (*EventQueue).run.func1() sync once.go:74 (*Once).doSlow(*Once(#45414), func(cilium#69)) sync once.go:65 (*Once).Do(...) eventqueue eventqueue.go:233 (*EventQueue).run(cilium#6023) ``` Generated from pp tool: https://github.com/maruel/panicparse Signed-off-by: Chris Tarazi <chris@isovalent.com>
As the previous commit mentions, this does not occur on main, v1.16, and v1.15 due to cilium#29036. However, in v1.14 and v1.13, we need to take special care because the NameManager and SelectorCache lock can be taken while the Endpoint lock is held during Endpoint deletion. Here are the relevant stacktraces regarding the deadlock: ``` 1: sync.Mutex.Lock [75 minutes] [Created by http.(*Server).Serve in goroutine 1699 @ server.go:3086] sync sema.go:77 runtime_SemacquireMutex(*uint32(0x5), false, 43690) sync mutex.go:171 (*Mutex).lockSlow(*Mutex(cilium#1733)) sync mutex.go:90 (*Mutex).Lock(...) sync rwmutex.go:147 (*RWMutex).Lock(*RWMutex(0xb0)) fqdn name_manager.go:70 (*NameManager).Lock(0xffffffffffffffff) policy selectorcache.go:964 (*SelectorCache).RemoveSelectors(cilium#1088, {cilium#28569, 0xb, 1}, {cilium#643, cilium#32582}) policy l4.go:810 (*L4Filter).removeSelectors(cilium#32582, cilium#29992) policy l4.go:817 (*L4Filter).detach(cilium#719, cilium#29993) policy l4.go:988 L4PolicyMap.Detach(...) policy l4.go:1179 (*L4Policy).Detach(cilium#20318, cilium#1383) policy resolve.go:103 (*selectorPolicy).Detach(...) policy distillery.go:81 (*PolicyCache).delete(cilium#1354, cilium#19354) policy distillery.go:138 (*PolicyCache).LocalEndpointIdentityRemoved(cilium#523, cilium#1163) identitymanager manager.go:167 (*IdentityManager).remove(cilium#706, cilium#19354) identitymanager manager.go:147 (*IdentityManager).Remove(cilium#706, cilium#19354) identitymanager manager.go:52 Remove(...) endpoint endpoint.go:1146 (*Endpoint).leaveLocked(cilium#1883, cilium#12221, {0x30, 0}) endpoint endpoint.go:2192 (*Endpoint).Delete(cilium#1883, {0x80, 0xaa}) endpointmanager manager.go:380 (*EndpointManager).removeEndpoint(cilium#1161, 0, {0xff, 0xff}) endpointmanager manager.go:394 (*EndpointManager).RemoveEndpoint(...) cmd endpoint.go:684 (*Daemon).deleteEndpointQuiet(...) cmd endpoint.go:666 (*Daemon).deleteEndpoint(cilium#1155, cilium#1883) cmd endpoint.go:713 (*Daemon).DeleteEndpoint(cilium#1155, {cilium#27399, cilium#8108}) cmd endpoint.go:770 (*deleteEndpointID).Handle(cilium#21193, {cilium#2452, {cilium#27399, 0x4d}}) endpoint delete_endpoint_id.go:66 (*DeleteEndpointID).ServeHTTP(cilium#5934, {cilium#666, cilium#2242}, cilium#2452) middleware operation.go:28 (*Context).RoutesHandler.NewOperationExecutor.func1({cilium#666, cilium#2242}, cilium#2452) http server.go:2136 HandlerFunc.ServeHTTP(ReadCloser(cilium#130), func{cilium#2242, 0x3}) middleware router.go:78 NewRouter.func1({cilium#666, cilium#2242}, cilium#2451) http server.go:2136 HandlerFunc.ServeHTTP(ReadCloser(cilium#718), func{cilium#2242, #59}) middleware redoc.go:72 Redoc.func1({cilium#666, cilium#2242}, cilium#1251) http server.go:2136 HandlerFunc.ServeHTTP(ReadCloser(cilium#4920), func{cilium#2242, #45}) middleware spec.go:46 Spec.func1({cilium#666, cilium#2242}, cilium#4921) http server.go:2136 HandlerFunc.ServeHTTP(ReadCloser(cilium#10532), func{cilium#2242, cilium#23015}) metrics middleware.go:64 (*APIEventTSHelper).ServeHTTP(cilium#1459, {cilium#668, cilium#10533}, cilium#2451) api apipanic.go:42 (*APIPanicHandler).ServeHTTP(cilium#722, {cilium#668, cilium#10533}, cilium#4922) http server.go:2938 serverHandler.ServeHTTP(*Server(cilium#8105), cilium#668, cilium#10533, 0x6) http server.go:2009 (*conn).serve(*conn(cilium#16005), Context{cilium#673, cilium#1554}) 8: sync.Mutex.Lock [74 minutes] [Created by http.(*Server).Serve in goroutine 1699 @ server.go:3086] sync sema.go:77 runtime_SemacquireMutex(*, 0x47, cilium#1154) sync mutex.go:171 (*Mutex).lockSlow(cilium#706) sync mutex.go:90 (*Mutex).Lock(...) sync rwmutex.go:147 (*RWMutex).Lock(*) identitymanager manager.go:99 (*IdentityManager).RemoveOldAddNew(cilium#706, 0, cilium#1154) identitymanager manager.go:123 RemoveOldAddNew(...) endpoint policy.go:852 (*Endpoint).SetIdentity(*, cilium#1154, 0) endpoint endpoint.go:1932 (*Endpoint).identityLabelsChanged(*, {cilium#674, *}, 1) endpoint endpoint.go:1780 (*Endpoint).runIdentityResolver(*, {cilium#674, *}, 1, 1) endpoint endpoint.go:1720 (*Endpoint).UpdateLabels(*, {cilium#674, *}, *, *, 8) cmd endpoint.go:477 (*Daemon).createEndpoint(cilium#1155, {cilium#673, *}, {cilium#683, cilium#1155}, *) cmd endpoint.go:542 (*putEndpointID).Handle(cilium#21192, {*, *, {*, 0xe}}) endpoint put_endpoint_id.go:58 (*PutEndpointID).ServeHTTP(cilium#3961, {cilium#666, *}, *) middleware operation.go:28 (*Context).RoutesHandler.NewOperationExecutor.func1({cilium#666, *}, *) http server.go:2136 HandlerFunc.ServeHTTP(*, {cilium#666, *}, *) middleware router.go:78 NewRouter.func1({cilium#666, *}, *) http server.go:2136 HandlerFunc.ServeHTTP(*, {cilium#666, *}, *) middleware redoc.go:72 Redoc.func1({cilium#666, *}, cilium#1251) http server.go:2136 HandlerFunc.ServeHTTP(*, {cilium#666, *}, #45) middleware spec.go:46 Spec.func1({cilium#666, *}, *) http server.go:2136 HandlerFunc.ServeHTTP(*, {cilium#666, *}, *) metrics middleware.go:64 (*APIEventTSHelper).ServeHTTP(cilium#1459, {cilium#668, *}, *) api apipanic.go:42 (*APIPanicHandler).ServeHTTP(#49, {cilium#668, *}, *) http server.go:2938 serverHandler.ServeHTTP({cilium#653}, {cilium#668, *}, 6) http server.go:2009 (*conn).serve(*, {cilium#673, cilium#1554}) 5: sync.Mutex.Lock [75 minutes] [Created by eventqueue.(*EventQueue).Run in goroutine 1482 @ eventqueue.go:229] sync sema.go:77 runtime_SemacquireMutex(cilium#142, 0xe8, *) sync mutex.go:171 (*Mutex).lockSlow(cilium#1733) sync mutex.go:90 (*Mutex).Lock(...) sync rwmutex.go:147 (*RWMutex).Lock(0x68) fqdn name_manager.go:70 (*NameManager).Lock(*) policy selectorcache.go:798 (*SelectorCache).AddFQDNSelector(cilium#1088, {cilium#643, *}, {{*, 0x4d}, {0, 0}}) policy l4.go:628 (*L4Filter).cacheFQDNSelector(...) policy l4.go:623 (*L4Filter).cacheFQDNSelectors(*, {*, 4, cilium#193}, cilium#536) policy l4.go:725 createL4Filter({cilium#680, *}, {*, 1, 1}, 0, {cilium#660, *}, {{*, 4}, ...}, ...) policy l4.go:879 createL4EgressFilter(...) policy rule.go:717 mergeEgressPortProto({cilium#680, *}, #44, {*, 0xa, 0}, *, {cilium#660, *}, {{*, ...}, ...}, ...) policy rule.go:672 mergeEgress.func1({cilium#660, *}) api l4.go:284 PortRules.Iterate({*, 1, cilium#546}, *) policy rule.go:624 mergeEgress({cilium#680, *}, *, {*, 1, 1}, 0, {cilium#661, *}, {cilium#662, ...}, ...) policy rule.go:753 (*rule).resolveEgressPolicy(*, {cilium#680, *}, *, *, *, {0, 0, 0}, {0, ...}) policy rules.go:103 ruleSlice.resolveL4EgressPolicy({*, *, *}, {cilium#680, *}, *) policy repository.go:718 (*Repository).resolvePolicyLocked(cilium#1089, *) policy distillery.go:119 (*PolicyCache).updateSelectorPolicy(cilium#1354, *) policy distillery.go:153 (*PolicyCache).UpdatePolicy(...) endpoint policy.go:262 (*Endpoint).regeneratePolicy(*) endpoint bpf.go:744 (*Endpoint).runPreCompilationSteps(*, *, *) endpoint bpf.go:589 (*Endpoint).regenerateBPF(*, *) endpoint policy.go:457 (*Endpoint).regenerate(*, *) endpoint events.go:53 (*EndpointRegenerationEvent).Handle(*, *) eventqueue eventqueue.go:245 (*EventQueue).run.func1() sync once.go:74 (*Once).doSlow(*, *) sync once.go:65 (*Once).Do(...) eventqueue eventqueue.go:233 (*EventQueue).run(*) 1: select [75 minutes] [Created by eventqueue.(*EventQueue).Run in goroutine 1482 @ eventqueue.go:229] semaphore semaphore.go:60 (*Weighted).Acquire(cilium#1092, {cilium#671, cilium#722}, cilium#766) lock semaphored_mutex.go:30 (*SemaphoredMutex).Lock(...) ipcache ipcache.go:140 (*IPCache).Lock(...) ipcache cidr.go:56 (*IPCache).AllocateCIDRs(cilium#1316, {0, 0, cilium#2037}, {0, 0, 0}, 0) ipcache cidr.go:103 (*IPCache).AllocateCIDRsForIPs(0, {0, cilium#697, 0}, 0xffffffffffffffff) cmd identity.go:114 cachingIdentityAllocator.AllocateCIDRsForIPs(...) policy selectorcache.go:509 (*SelectorCache).allocateIdentityMappings(cilium#1088, {{0, 0}, {cilium#5036, 0x2b}}, #45) policy selectorcache.go:843 (*SelectorCache).AddFQDNSelector(cilium#1088, {cilium#643, cilium#27811}, {{0, 0}, {cilium#5036, 0x2b}}) policy l4.go:628 (*L4Filter).cacheFQDNSelector(...) policy l4.go:623 (*L4Filter).cacheFQDNSelectors(cilium#27811, {#42936, 0x287, cilium#193}, cilium#536) policy l4.go:725 createL4Filter({cilium#680, cilium#22826}, {cilium#22828, 1, 1}, 0, {cilium#660, cilium#24510}, {{cilium#4487, 3}, ...}, ...) policy l4.go:879 createL4EgressFilter(...) policy rule.go:717 mergeEgressPortProto({cilium#680, cilium#22826}, #44, {cilium#22828, 0xa, #78536}, #79633, {cilium#660, cilium#24510}, {{cilium#4487, ...}, ...}, ...) policy rule.go:672 mergeEgress.func1({cilium#660, cilium#24510}) api l4.go:284 PortRules.Iterate({cilium#24510, 1, cilium#546}, cilium#11741) policy rule.go:624 mergeEgress({cilium#680, cilium#22826}, cilium#18687, {cilium#22828, 1, 1}, 0, {cilium#661, cilium#5624}, {cilium#662, ...}, ...) policy rule.go:753 (*rule).resolveEgressPolicy(cilium#24575, {cilium#680, cilium#22826}, cilium#18687, cilium#29345, cilium#4782, {0, 0, 0}, {0, ...}) policy rules.go:103 ruleSlice.resolveL4EgressPolicy({cilium#10690, 0xb, 5}, {cilium#680, cilium#22826}, cilium#18687) policy repository.go:718 (*Repository).resolvePolicyLocked(cilium#1089, cilium#18461) policy distillery.go:119 (*PolicyCache).updateSelectorPolicy(cilium#1354, cilium#18461) policy distillery.go:153 (*PolicyCache).UpdatePolicy(...) endpoint policy.go:262 (*Endpoint).regeneratePolicy(cilium#1748) endpoint bpf.go:744 (*Endpoint).runPreCompilationSteps(cilium#1748, cilium#27542, cilium#4781) endpoint bpf.go:589 (*Endpoint).regenerateBPF(cilium#1748, cilium#27542) endpoint policy.go:457 (*Endpoint).regenerate(cilium#1748, cilium#27542) endpoint events.go:53 (*EndpointRegenerationEvent).Handle(cilium#18609, cilium#703) eventqueue eventqueue.go:245 (*EventQueue).run.func1() sync once.go:74 (*Once).doSlow(*Once(#45414), func(cilium#69)) sync once.go:65 (*Once).Do(...) eventqueue eventqueue.go:233 (*EventQueue).run(cilium#6023) ``` Generated from pp tool: https://github.com/maruel/panicparse Signed-off-by: Chris Tarazi <chris@isovalent.com>
While updating eksctl to v0.101.0 in master/v1.11 as part of commit
396eb1b ("ci: update eksctl to v0.101.0 and AWS VPC CNI plugin to
v1.11"), the AWS VPC CNI was updated to 1.11 on master, but not on the
v1.11 branch. Do that now to fix the following error on that workflow:
Since v1.12 was branched of in the meantime, also using the outdated AWS
VPC CNI version bump it in the corresponding workflow as well.
Fixes: 396eb1b ("ci: update eksctl to v0.101.0 and AWS VPC CNI plugin to v1.11")
Reported-by: Martynas Pumputis m@lambda.lt