Significantly reduce memory usage during cilium-cli sysdump collection #36987

giorio94 · 2025-01-15T11:27:09Z

Please review commit by commit, and refer to the individual commit messages for additional details.

aanm

adding @bimmlerd as a reviewer in case he wants to take a look at the IO interactions with kubelet / kube-apiserver.

cilium-cli/connectivity/tests/errors.go

tklauser

Thanks!

giorio94 · 2025-01-16T07:44:19Z

/test

bimmlerd

I dislike that we have to go to these measures to not use an absurd amount of memory :(

I think I saw a discussion around not printing list types directly, but there was compatibility questions? That, plus pagination, would seem like the natural approach to solve this problem, but I'll not block this great improvement on it.

cilium-cli/sysdump/writers.go

giorio94 · 2025-01-16T09:42:41Z

I dislike that we have to go to these measures to not use an absurd amount of memory :(

Totally agree. I honestly couldn't find a better solution though; attempting to push a modification upstream seems likely to take forever (and I'd need quite some time to gain confidence with the marshaller internals), while trying to switch to a different library would not be guaranteed to preserve the same resulting format.

I think I saw a discussion around not printing list types directly, but there was compatibility questions? That, plus pagination, would seem like the natural approach to solve this problem, but I'll not block this great improvement on it.

Yep, printing one item of the list at a time has the same benefits, does not require any special hacks and it would support practically out of the box pagination. However, there seemed to be concerns given that it would break compatibility and potentially require adaptations to tools parsing the data from sysdumps. In the end, I decided to go with this solution that, although a bit more complex and less elegant, is expected to fully preserve compatibility.

The yaml marshaler appears to consume a significant amount of memory when processing large objects (mostly lists of pods/nodes in the context of sysdumps) and does not seem to support streaming processing out of the box. In an effort to reduce the RAM requirements during sysdump collection, let's leverage a manual approach to independently marshal each list item independently, immediately writing the output into the target file. Let's also introduce a test to validate that we respect the expected format, to prevent breaking changes. Overall, processing one list item at a time reduced by approx. 5 times the memory requirements both considering the retrieval of 1K nodes (150MB/625MB RSS) and the full sysdump collection against a 1K nodes cluster (2.5GB/14GB RSS). Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

Directly write the streamed data to file, rather than to a temporary buffer first, to reduce the overall memory usage. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

Directly write the streamed logs to file, rather than to a temporary buffer first, to reduce the overall memory usage. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

Directly write the marshalled output to the target file, rather than to an intermediate buffer first, to reduce overall memory usage. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

giorio94 · 2025-01-16T10:04:40Z

/test

dylandreimerink

🚀

giorio94 added release-note/minor This PR changes functionality that users may find relevant to operating Cilium. cilium-cli This PR contains changes related with cilium-cli labels Jan 15, 2025

giorio94 requested review from a team as code owners January 15, 2025 11:27

giorio94 requested review from nathanjsweet and dylandreimerink January 15, 2025 11:27

github-actions bot added the cilium-cli-exclusive This PR only impacts cilium-cli binary label Jan 15, 2025

aanm approved these changes Jan 15, 2025

View reviewed changes

aanm requested a review from bimmlerd January 15, 2025 11:56

tklauser reviewed Jan 15, 2025

View reviewed changes

cilium-cli/connectivity/tests/errors.go Outdated Show resolved Hide resolved

giorio94 requested a review from tklauser January 15, 2025 14:13

giorio94 force-pushed the mio/cilium-cli-sysdump-reduce-memory-usage branch from f3e8726 to ab02c35 Compare January 15, 2025 14:13

tklauser approved these changes Jan 15, 2025

View reviewed changes

giorio94 force-pushed the mio/cilium-cli-sysdump-reduce-memory-usage branch from ab02c35 to f42db32 Compare January 15, 2025 16:02

giorio94 requested a review from a team as a code owner January 15, 2025 16:02

giorio94 requested a review from Artyop January 15, 2025 16:02

bimmlerd approved these changes Jan 16, 2025

View reviewed changes

cilium-cli/sysdump/writers.go Outdated Show resolved Hide resolved

cilium-cli/sysdump/writers.go Outdated Show resolved Hide resolved

giorio94 added 4 commits January 16, 2025 10:43

cilium-cli sysdump: directly write kubectl exec stream to file

6819728

Directly write the streamed data to file, rather than to a temporary buffer first, to reduce the overall memory usage. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

cilium-cli sysdump: directly write logs stream to file

2cea31f

Directly write the streamed logs to file, rather than to a temporary buffer first, to reduce the overall memory usage. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

cilium-cli sysdump: drop a few more intermediate buffers

dc38892

Directly write the marshalled output to the target file, rather than to an intermediate buffer first, to reduce overall memory usage. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

giorio94 force-pushed the mio/cilium-cli-sysdump-reduce-memory-usage branch from f42db32 to dc38892 Compare January 16, 2025 09:43

dylandreimerink approved these changes Jan 16, 2025

View reviewed changes

Artyop approved these changes Jan 16, 2025

View reviewed changes

giorio94 removed the request for review from nathanjsweet January 16, 2025 10:40

maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 16, 2025

tklauser added this pull request to the merge queue Jan 16, 2025

Merged via the queue into cilium:main with commit ad64f5d Jan 16, 2025
62 of 63 checks passed

github-actions bot mentioned this pull request Feb 3, 2025

Bump ciliumcli from v0.16.0 to v0.18.3 kokyhm/kubespray_dependency_upgrade_test#153

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Significantly reduce memory usage during cilium-cli sysdump collection #36987

Significantly reduce memory usage during cilium-cli sysdump collection #36987

Uh oh!

giorio94 commented Jan 15, 2025

Uh oh!

aanm left a comment

Uh oh!

Uh oh!

tklauser left a comment

Uh oh!

giorio94 commented Jan 16, 2025

Uh oh!

bimmlerd left a comment

Uh oh!

Uh oh!

Uh oh!

giorio94 commented Jan 16, 2025

Uh oh!

giorio94 commented Jan 16, 2025

Uh oh!

dylandreimerink left a comment

Uh oh!

Uh oh!

Uh oh!

Significantly reduce memory usage during cilium-cli sysdump collection #36987

Significantly reduce memory usage during cilium-cli sysdump collection #36987

Uh oh!

Conversation

giorio94 commented Jan 15, 2025

Uh oh!

aanm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tklauser left a comment

Choose a reason for hiding this comment

Uh oh!

giorio94 commented Jan 16, 2025

Uh oh!

bimmlerd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

giorio94 commented Jan 16, 2025

Uh oh!

giorio94 commented Jan 16, 2025

Uh oh!

dylandreimerink left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!