Skip to content

Conversation

giorio94
Copy link
Member

Please review commit by commit, and refer to the individual commit messages for additional details.

@giorio94 giorio94 added release-note/minor This PR changes functionality that users may find relevant to operating Cilium. cilium-cli This PR contains changes related with cilium-cli labels Jan 15, 2025
@giorio94 giorio94 requested review from a team as code owners January 15, 2025 11:27
@github-actions github-actions bot added the cilium-cli-exclusive This PR only impacts cilium-cli binary label Jan 15, 2025
Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding @bimmlerd as a reviewer in case he wants to take a look at the IO interactions with kubelet / kube-apiserver.

@aanm aanm requested a review from bimmlerd January 15, 2025 11:56
@giorio94 giorio94 requested a review from tklauser January 15, 2025 14:13
@giorio94 giorio94 force-pushed the mio/cilium-cli-sysdump-reduce-memory-usage branch from f3e8726 to ab02c35 Compare January 15, 2025 14:13
Copy link
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@giorio94 giorio94 force-pushed the mio/cilium-cli-sysdump-reduce-memory-usage branch from ab02c35 to f42db32 Compare January 15, 2025 16:02
@giorio94 giorio94 requested a review from a team as a code owner January 15, 2025 16:02
@giorio94 giorio94 requested a review from Artyop January 15, 2025 16:02
@giorio94
Copy link
Member Author

/test

Copy link
Member

@bimmlerd bimmlerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike that we have to go to these measures to not use an absurd amount of memory :(

I think I saw a discussion around not printing list types directly, but there was compatibility questions? That, plus pagination, would seem like the natural approach to solve this problem, but I'll not block this great improvement on it.

@giorio94
Copy link
Member Author

I dislike that we have to go to these measures to not use an absurd amount of memory :(

Totally agree. I honestly couldn't find a better solution though; attempting to push a modification upstream seems likely to take forever (and I'd need quite some time to gain confidence with the marshaller internals), while trying to switch to a different library would not be guaranteed to preserve the same resulting format.

I think I saw a discussion around not printing list types directly, but there was compatibility questions? That, plus pagination, would seem like the natural approach to solve this problem, but I'll not block this great improvement on it.

Yep, printing one item of the list at a time has the same benefits, does not require any special hacks and it would support practically out of the box pagination. However, there seemed to be concerns given that it would break compatibility and potentially require adaptations to tools parsing the data from sysdumps. In the end, I decided to go with this solution that, although a bit more complex and less elegant, is expected to fully preserve compatibility.

The yaml marshaler appears to consume a significant amount of memory
when processing large objects (mostly lists of pods/nodes in the context
of sysdumps) and does not seem to support streaming processing out of
the box. In an effort to reduce the RAM requirements during sysdump
collection, let's leverage a manual approach to independently marshal
each list item independently, immediately writing the output into the
target file. Let's also introduce a test to validate that we respect
the expected format, to prevent breaking changes.

Overall, processing one list item at a time reduced by approx. 5 times
the memory requirements both considering the retrieval of 1K nodes
(150MB/625MB RSS) and the full sysdump collection against a 1K
nodes cluster (2.5GB/14GB RSS).

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Directly write the streamed data to file, rather than to a temporary
buffer first, to reduce the overall memory usage.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Directly write the streamed logs to file, rather than to a temporary
buffer first, to reduce the overall memory usage.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Directly write the marshalled output to the target file, rather than
to an intermediate buffer first, to reduce overall memory usage.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 force-pushed the mio/cilium-cli-sysdump-reduce-memory-usage branch from f42db32 to dc38892 Compare January 16, 2025 09:43
@giorio94
Copy link
Member Author

/test

Copy link
Member

@dylandreimerink dylandreimerink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@giorio94 giorio94 removed the request for review from nathanjsweet January 16, 2025 10:40
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 16, 2025
@tklauser tklauser added this pull request to the merge queue Jan 16, 2025
Merged via the queue into cilium:main with commit ad64f5d Jan 16, 2025
62 of 63 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cilium-cli This PR contains changes related with cilium-cli cilium-cli-exclusive This PR only impacts cilium-cli binary ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants