-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Significantly reduce memory usage during cilium-cli sysdump collection #36987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significantly reduce memory usage during cilium-cli sysdump collection #36987
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding @bimmlerd as a reviewer in case he wants to take a look at the IO interactions with kubelet / kube-apiserver.
f3e8726
to
ab02c35
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
ab02c35
to
f42db32
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dislike that we have to go to these measures to not use an absurd amount of memory :(
I think I saw a discussion around not printing list types directly, but there was compatibility questions? That, plus pagination, would seem like the natural approach to solve this problem, but I'll not block this great improvement on it.
Totally agree. I honestly couldn't find a better solution though; attempting to push a modification upstream seems likely to take forever (and I'd need quite some time to gain confidence with the marshaller internals), while trying to switch to a different library would not be guaranteed to preserve the same resulting format.
Yep, printing one item of the list at a time has the same benefits, does not require any special hacks and it would support practically out of the box pagination. However, there seemed to be concerns given that it would break compatibility and potentially require adaptations to tools parsing the data from sysdumps. In the end, I decided to go with this solution that, although a bit more complex and less elegant, is expected to fully preserve compatibility. |
The yaml marshaler appears to consume a significant amount of memory when processing large objects (mostly lists of pods/nodes in the context of sysdumps) and does not seem to support streaming processing out of the box. In an effort to reduce the RAM requirements during sysdump collection, let's leverage a manual approach to independently marshal each list item independently, immediately writing the output into the target file. Let's also introduce a test to validate that we respect the expected format, to prevent breaking changes. Overall, processing one list item at a time reduced by approx. 5 times the memory requirements both considering the retrieval of 1K nodes (150MB/625MB RSS) and the full sysdump collection against a 1K nodes cluster (2.5GB/14GB RSS). Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Directly write the streamed data to file, rather than to a temporary buffer first, to reduce the overall memory usage. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Directly write the streamed logs to file, rather than to a temporary buffer first, to reduce the overall memory usage. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Directly write the marshalled output to the target file, rather than to an intermediate buffer first, to reduce overall memory usage. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
f42db32
to
dc38892
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Please review commit by commit, and refer to the individual commit messages for additional details.