-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Description
It looks like we could modify stb_textedit.h to enable the user to keep an UTF-8 underlying representation while avoiding random lookup. If we consider all indices and num_chars values as byte-indices (and not character-indices), we'd only need two extra functions:
- STB_TEXTEDIT_GETPREVIOUSCHARINDEX(obj, int idx)
Default to i-1
For UTF-8 the user would need to backtrack in the stream looking for the first < 0x80 byte. that would work and be efficient enough. However it means that user's own handling of malformed UTF-8 (for which there are no standard convention for, AFAIK), to be compatible with rewinding would have to do the reverse operation. Editing malformed UTF-8 is a super edge-case that is reasonable to avoid or catch earlier, and wouldn't affect people not using UTF-8.
- STB_TEXTEDIT_GETNEXTCHARINDEX(obj, int char_idx);
Default to i+1
Name would be a nice symmetry to the previous function. It could also be turned into a STB_TEXTEDIT_GETNUMINDICESFORCHAR() / STB_TEXTEDIT_GETBYTECOUNTFORCHAR() defaulting to 1.
- A common pattern used by stb_textedit.h would be to call STB_TEXTEDIT_GETWIDTH() or STB_TEXTEDIT_GETCHAR() with one of those functions, so we could offer a way for the user to do both at once possibly, but we don't have to. It may just add unnecessary complexity to offer those.
With this scheme a typical loop such as
for (i=0; first+i < n; ++i)
find->x += STB_TEXTEDIT_GETWIDTH(str, first, i);
Would become
for (i=0; first+i < n; i = STB_TEXTEDIT_GETNEXTCHARINDEX(i))
find->x += STB_TEXTEDIT_GETWIDTH(str, first, i);
There's probably a few other things to solve and clarify but that's the gist of it.
Do you think you would take such a patch?
(
This is merely me dumping some ideas, as I'm not yet sure I went to undergo this modification. It has been niggling be that my text input widget has to do back and forth UTF-8 - wchar conversions. As I'm trying to handle large of text reasonably in an imgui context and not doing multiple pass on the data, the code is already quite complex and would benefit from only dealing with a single UTF-8 buffer. However for interactive performances with large amount of text, I may just as well have to rewrite something anyway because stb_textedit is not designed for large text.
So I can either:
a) Add this UTF-8 support to stb_textedit, it would simplify my code a lot (primary focus), make it a little faster, and generally may be useful to have that support in stb_textedit. The cons is that the code in stb_textedit.h will look a little heavier.
b) Rewrite something custom, more stateful to handle interacting with large text. More effort. I don't absolutely need the perf but it'd be nice. Nobody else will benefit from the improvement. I'd prefer to avoid this path.
)