-
Notifications
You must be signed in to change notification settings - Fork 576
perf: significantly improve rendering performance by reducing GC pressure #687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The `renderIterator` function previously caused an extremely large amount of small strings and other objects to be created and abandoned during rendering, which caused performance issues after some time as the GC had to occasionally collect all of these objects. This was exacerbated by using streaming in models, which leads to extremely frequent updates. This commit refactors renderIterator to avoid constructing temporary strings. Instead, the function now performs two passes: 1. A first pass in which the "fragments" to render are aggregated, but the `rendered` string is not yet copied and appended/prepended to. 2. A second pass, which uses a `strings.Builder` to efficiently construct the final output string. This has *significantly* improved crush's performance for me. Whereas before `perf` would show it spending up to 70% (!) of its time in GC-related Go runtime functions, it now spends a trivial amount there. pprof's heap profiling previously showed renderIterator as a massive hotspot, whereas it now doesn't even show up in alloc `top` anymore. The updated function is slightly harder to read. I did spend some time trying different options for making it more readable, and also asking various LLMs about it (using crush!), but ultimately didn't find anything better than the two-pass solution.
038013e
to
ddbe51c
Compare
By the way, in case anybody else wants to look at stuff like this: There's a second pathological case somewhere when rendering individual lines in ultraviolet. When this occurs there's weird input lag until rendering "catches up". I suspect that something might be off with the logic around changed lines in |
Appreciate this, @tazjin! We'll prioritize this one. @aymanbagabas: heads up about UV. |
Thanks! I found a low-hanging fruit in ultraviolet and sent a PR for that, but I think there's some more tricky issue hiding somewhere in this. |
I've created a tracking ticket for performance issues to avoid spamming this PR with stuff that isn't directly related to this change: #691 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you make sense this PR. specially because 100 items, creates ~300 items that goes straight to garbage.
The `renderIterator` function previously caused an extremely large amount of small strings and other objects to be created and abandoned during rendering, which caused performance issues after some time as the GC had to occasionally collect all of these objects. This was exacerbated by using streaming in models, which leads to extremely frequent updates. This commit refactors renderIterator to avoid constructing temporary strings. Instead, the function now performs two passes: 1. A first pass in which the "fragments" to render are aggregated, but the `rendered` string is not yet copied and appended/prepended to. 2. A second pass, which uses a `strings.Builder` to efficiently construct the final output string. This has *significantly* improved crush's performance for me. Whereas before `perf` would show it spending up to 70% (!) of its time in GC-related Go runtime functions, it now spends a trivial amount there. pprof's heap profiling previously showed renderIterator as a massive hotspot, whereas it now doesn't even show up in alloc `top` anymore. The updated function is slightly harder to read. I did spend some time trying different options for making it more readable, and also asking various LLMs about it (using crush!), but ultimately didn't find anything better than the two-pass solution.
This is one of a series of changes related to #691
The
renderIterator
function previously caused an extremely large amount of small strings and other objects to be created and abandoned during rendering, which caused performance issues after some time as the GC had to occasionally collect all of these objects.This was exacerbated by using streaming in models, which leads to extremely frequent updates.
This commit refactors renderIterator to avoid constructing temporary strings. Instead, the function now performs two passes:
A first pass in which the "fragments" to render are aggregated, but the
rendered
string is not yet copied and appended/prepended to.A second pass, which uses a
strings.Builder
to efficiently construct the final output string.This has significantly improved crush's performance for me. Whereas before
perf
would show it spending up to 70% (!) of its time in GC-related Go runtime functions, it now spends a trivial amount there.pprof's heap profiling previously showed renderIterator as a massive hotspot, whereas it now doesn't even show up in alloc
top
anymore.The updated function is slightly harder to read. I did spend some time trying different options for making it more readable, and also asking various LLMs about it (using crush!), but ultimately didn't find anything better than the two-pass solution.
Describe your changes
Related issue/discussion:
Checklist before requesting a review
CONTRIBUTING.md