Skip to content

Cilium Envoy pod crashed when experimenting with TLS interception (SDS enabled) #36259

@giorio94

Description

@giorio94

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

Cilium envoy crashed with:

[2024-11-29 11:16:48.630][51][error][envoy_bug] [cilium/network_policy.cc:612] envoy bug failure: !Thread::MainThread::isMainOrTestThread()
[2024-11-29 11:16:48.630][51][error][envoy_bug] [external/envoy/source/common/common/assert.h:38] stacktrace for envoy bug
[2024-11-29 11:16:48.631][51][error][envoy_bug] [external/envoy/source/common/common/assert.h:45] #0 UNKNOWN [0x55c3bd9d986c]
[2024-11-29 11:16:48.631][51][error][envoy_bug] [external/envoy/source/common/common/assert.h:45] #1 UNKNOWN [0x55c3bd9d4639]
[2024-11-29 11:16:48.631][51][error][envoy_bug] [external/envoy/source/common/common/assert.h:45] #2 UNKNOWN [0x55c3bd9d454f]
[2024-11-29 11:16:48.632][51][error][envoy_bug] [external/envoy/source/common/common/assert.h:45] #3 UNKNOWN [0x55c3bec48eac]
[2024-11-29 11:16:48.632][51][error][envoy_bug] [external/envoy/source/common/common/assert.h:45] #4 UNKNOWN [0x55c3bf4d7e59]
[2024-11-29 11:16:48.632][51][error][envoy_bug] [external/envoy/source/common/common/assert.h:45] #5 UNKNOWN [0x55c3bf7a2589]
[2024-11-29 11:16:48.632][51][error][envoy_bug] [external/envoy/source/common/common/assert.h:45] #6 UNKNOWN [0x55c3bf7a1341]
[2024-11-29 11:16:48.632][51][error][envoy_bug] [external/envoy/source/common/common/assert.h:45] #7 UNKNOWN [0x55c3bec71909]
[2024-11-29 11:16:48.632][51][error][envoy_bug] [external/envoy/source/common/common/assert.h:45] #8 UNKNOWN [0x55c3bf82b40e]
[2024-11-29 11:16:48.633][51][error][envoy_bug] [external/envoy/source/common/common/assert.h:45] #9 UNKNOWN [0x7f4e53674ac3]
[2024-11-29 11:16:48.634][51][critical][backtrace] [external/envoy/source/server/backtrace.h:127] Caught Aborted, suspect faulting address 0xd
[2024-11-29 11:16:48.635][51][critical][backtrace] [external/envoy/source/server/backtrace.h:111] Backtrace (use tools/stack_decode.py to get line numbers):
[2024-11-29 11:16:48.635][51][critical][backtrace] [external/envoy/source/server/backtrace.h:112] Envoy version: f09ed995abccd4d360c769d256a781f1874c2f3b/1.31.3/Distribution/RELEASE/BoringSSL
[2024-11-29 11:16:48.635][51][critical][backtrace] [external/envoy/source/server/backtrace.h:114] Address mapping: 55c3bd920000-55c3bfde7000 /usr/bin/cilium-envoy
[2024-11-29 11:16:48.635][51][critical][backtrace] [external/envoy/source/server/backtrace.h:121] #0: [0x7f4e53622520]

How can we reproduce the issue?

I don't have clear reproduction steps at the moment. I've configured TLS interception (precise policies in sysdump), updated the certificate a few times, and issued a few curl requests. At a certain point one of the cilium envoy pods crashed with the above log. One relevant note is that kind-worker3 (the one hosting the crashing envoy proxy) previously hosted one client, which had already terminated at that point.

Cilium Version

Recent tip of main: v1.17.0-dev-0aeeefb4431
Cilium envoy: tip of main (f09ed995abccd4d360c769d256a781f1874c2f3b)

Sysdump

cilium-sysdump-20241129-121901.zip

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

Labels

area/agentCilium agent related.area/proxyImpacts proxy components, including DNS, Kafka, Envoy and/or XDS servers.area/servicemeshGH issues or PRs regarding servicemeshkind/bugThis is a bug in the Cilium logic.needs/triageThis issue requires triaging to establish severity and next steps.

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions