Skip to content

Conversation

lemire
Copy link
Member

@lemire lemire commented May 20, 2025

The convention is Node.js is that lenient processing stops with the first '=' character. Thus we have

 test("YQ ==", "a");
 test("YQ == junk", "a");

The current fix is suboptimal: I look for the first occurence of = using std::find. We should roll our own fast 'find' function.

I have coded a fast find function for some kernels.

Fixes #792

Fixes #795

@lemire lemire requested review from pauldreik and WojciechMula May 21, 2025 03:15

simdutf_really_inline const char *find(const char *start, const char *end,
char character) noexcept {
for (; std::distance(start, end) >= 64; start += 64) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: Std::distance has a O(N) complexity avoid calling it will improve performance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that, in this instance, it is constant time because a pointer is a "LegacyRandomAccessIterator".

https://en.cppreference.com/w/cpp/named_req/RandomAccessIterator

Capture d’écran, le 2025-05-20 à 23 35 50

I believe that it is even applies to cases like this...

std::vector<X> v =....
return std::distance(v.begin(), v.end());

It is possible that an alternative coding could improve performance in some cases. I am not sure.

@lemire
Copy link
Member Author

lemire commented May 24, 2025

I am going to merge. Thanks @anonrig and @pauldreik for the review.

@lemire lemire merged commit c6bf96e into master May 24, 2025
61 of 75 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

code a fast 'find character' for the POWER kernel Code a fast 'find character' function for at least some kernels (x64 and arm)
3 participants