Skip to content

Conversation

tazjin
Copy link
Contributor

@tazjin tazjin commented Aug 9, 2025

This is one of a series of changes related to #691


The renderIterator function previously caused an extremely large amount of small strings and other objects to be created and abandoned during rendering, which caused performance issues after some time as the GC had to occasionally collect all of these objects.

This was exacerbated by using streaming in models, which leads to extremely frequent updates.

This commit refactors renderIterator to avoid constructing temporary strings. Instead, the function now performs two passes:

  1. A first pass in which the "fragments" to render are aggregated, but the rendered string is not yet copied and appended/prepended to.

  2. A second pass, which uses a strings.Builder to efficiently construct the final output string.

This has significantly improved crush's performance for me. Whereas before perf would show it spending up to 70% (!) of its time in GC-related Go runtime functions, it now spends a trivial amount there.

pprof's heap profiling previously showed renderIterator as a massive hotspot, whereas it now doesn't even show up in alloc top anymore.

The updated function is slightly harder to read. I did spend some time trying different options for making it more readable, and also asking various LLMs about it (using crush!), but ultimately didn't find anything better than the two-pass solution.

Describe your changes

Related issue/discussion:

Checklist before requesting a review

@tazjin tazjin requested a review from kujtimiihoxha as a code owner August 9, 2025 16:41
The `renderIterator` function previously caused an extremely large
amount of small strings and other objects to be created and abandoned
during rendering, which caused performance issues after some time as the
GC had to occasionally collect all of these objects.

This was exacerbated by using streaming in models, which leads to
extremely frequent updates.

This commit refactors renderIterator to avoid constructing temporary
strings. Instead, the function now performs two passes:

1. A first pass in which the "fragments" to render are aggregated, but
   the `rendered` string is not yet copied and appended/prepended to.

2. A second pass, which uses a `strings.Builder` to efficiently
   construct the final output string.

This has *significantly* improved crush's performance for me. Whereas
before `perf` would show it spending up to 70% (!) of its time in
GC-related Go runtime functions, it now spends a trivial amount there.

pprof's heap profiling previously showed renderIterator as a massive
hotspot, whereas it now doesn't even show up in alloc `top` anymore.

The updated function is slightly harder to read. I did spend some time
trying different options for making it more readable, and also asking
various LLMs about it (using crush!), but ultimately didn't find
anything better than the two-pass solution.
@tazjin tazjin force-pushed the perf-gc-pressure branch from 038013e to ddbe51c Compare August 9, 2025 16:55
@tazjin
Copy link
Contributor Author

tazjin commented Aug 9, 2025

By the way, in case anybody else wants to look at stuff like this: There's a second pathological case somewhere when rendering individual lines in ultraviolet. When this occurs there's weird input lag until rendering "catches up".

I suspect that something might be off with the logic around changed lines in terminal_renderer.go, or that there's some debouncing missing, but I'm not gonna debug that today.

@meowgorithm
Copy link
Member

Appreciate this, @tazjin! We'll prioritize this one.

@aymanbagabas: heads up about UV.

@tazjin
Copy link
Contributor Author

tazjin commented Aug 9, 2025

Thanks! I found a low-hanging fruit in ultraviolet and sent a PR for that, but I think there's some more tricky issue hiding somewhere in this.

@tazjin
Copy link
Contributor Author

tazjin commented Aug 9, 2025

I've created a tracking ticket for performance issues to avoid spamming this PR with stuff that isn't directly related to this change: #691

@caarlos0 caarlos0 changed the title Significantly improve rendering performance by reducing GC pressure per: significantly improve rendering performance by reducing GC pressure Aug 11, 2025
@caarlos0 caarlos0 changed the title per: significantly improve rendering performance by reducing GC pressure perf: significantly improve rendering performance by reducing GC pressure Aug 11, 2025
Copy link
Member

@raphamorim raphamorim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you make sense this PR. specially because 100 items, creates ~300 items that goes straight to garbage.

@aymanbagabas aymanbagabas merged commit bdb0a4c into charmbracelet:main Aug 11, 2025
10 checks passed
@tazjin tazjin deleted the perf-gc-pressure branch August 11, 2025 14:28
scalabl3 pushed a commit to metaphori-ai/crush that referenced this pull request Aug 19, 2025
The `renderIterator` function previously caused an extremely large
amount of small strings and other objects to be created and abandoned
during rendering, which caused performance issues after some time as the
GC had to occasionally collect all of these objects.

This was exacerbated by using streaming in models, which leads to
extremely frequent updates.

This commit refactors renderIterator to avoid constructing temporary
strings. Instead, the function now performs two passes:

1. A first pass in which the "fragments" to render are aggregated, but
   the `rendered` string is not yet copied and appended/prepended to.

2. A second pass, which uses a `strings.Builder` to efficiently
   construct the final output string.

This has *significantly* improved crush's performance for me. Whereas
before `perf` would show it spending up to 70% (!) of its time in
GC-related Go runtime functions, it now spends a trivial amount there.

pprof's heap profiling previously showed renderIterator as a massive
hotspot, whereas it now doesn't even show up in alloc `top` anymore.

The updated function is slightly harder to read. I did spend some time
trying different options for making it more readable, and also asking
various LLMs about it (using crush!), but ultimately didn't find
anything better than the two-pass solution.
scalabl3 pushed a commit to metaphori-ai/crush that referenced this pull request Aug 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants