-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
feat(Text): Add support for Intl.segmenter #10584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Review or Edit in CodeSandboxOpen the branch in Web Editor • VS Code • Insiders |
We don't really support splitting internally outside the whole char function. |
Build Stats
|
I guess yeah its better what you recommended in issue. I edited the PR with following cnhages: added https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segmenter as main method of splitting string into graphemes if its availabe, current code as fallback. Imo Its just weird that this library by default does not support all emojis. |
I know is weird, but emoji is a living standard that updates once at year at minimum, there are unicode tables that grows in size to support those. If you look at a grapheme splitter that supports up to unicode 15 Unicode 16 adds a bunch of new chars and is not supported yet by that library. |
This looks good to me, thanks for following up. |
Well apparently i didn't fix prettier. |
Ok so for this test here:
The normal js simple loop takes 1ms to split, with the grapheme splitter it takes 8ms, that is surprising to me ( those are the values on my machine a 5y old m1 pro ) If we factor out the segmenter creation like this:
it goes back to 1ms or less ( depending on the garbage collector, my there is no perceived degradation of performance ). We will need to think something around the env or config object and see what is best. We are nearly there. |
Hmm weird i tried to test in chrome on my machine it was not making this big difference. (linux chrome ryzen). It encoded 1k of these string in 250-300 ms. But yeah would make sense to initialize it once. |
For now let's merge this, i will try to make it fit in the cache or config class in some way. This is a good improvement, thank you. |
Description
Fixes #10583
In Action
Added unit test case that verifies that the flag emojis are split correctly.
Name of the test case
correctly splits strings including flag emojis into graphmes
.