Problem: search_stat not reset when pattern differs in case #17314

chrisbra · 2025-05-14T08:05:58Z

Problem: search_stat not reset when pattern differs in case
Solution: use STRNCMP instead of MB_STRNICMP macro

There was a long standing todo comment, that using MB_STRNICMP is wrong. So let's change it to STRNCMP() instead. Even if it not handle multi-byte characters correctly, then Vim will rather recompute the search stat, instead of re-using the old (and possibly wrong) value.

fixes: #17312

Problem: search_stat not reset when pattern differs in case Solution: use STRNCMP instead of MB_STRNICMP macro There was a long standing todo comment, that using MB_STRNICMP is wrong. So let's change it to STRNCMP() instead. Even if it not handle multi-byte characters correctly, then Vim will rather recompute the search stat, instead of re-using the old (and possibly wrong) value. fixes: vim#17312 Signed-off-by: Christian Brabandt <cb@256bit.org>

girishji · 2025-05-14T09:49:20Z

@chrisbra, will this work for mb_strncmp()?

/*
 * Compare two multibyte strings up to "n" characters.
 * Case-sensitive version of mb_strnicmp().
 */
int mb_strncmp(char_u *s1, char_u *s2, size_t n)
{
    int c1, c2;
    size_t i = 0;

    while (i < n)
    {
        c1 = mb_ptr2char_adv(&s1);
        c2 = mb_ptr2char_adv(&s2);

        if (c1 != c2)
            return c1 - c2;
        if (c1 == NUL)  // end of string
            break;
        ++i;
    }
    return 0;
}

It should work for utf8 and DBCS. But, the comment says:

 * Needed for Big5, Shift-JIS and UTF-8 encoding.  Other DBCS encodings can
 * probably use strnicmp(), because there are no ASCII characters in the...

So, the above version may not work for Big5, etc. An alternative approach is to implement a utf_strncmp() function (below), and use it in mb_strncmp() modeled after mb_strnicmp(), but without the MB_TOLOWER() step.

/*
 * Compare two UTF-8 strings up to "n" Unicode characters.
 * Case-sensitive.
 */
int utf_strncmp(char_u *s1, char_u *s2, size_t n)
{
    int c1, c2;
    size_t i = 0;

    while (i < n)
    {
        if (*s1 == NUL && *s2 == NUL)
            return 0;

        c1 = utf_ptr2char((char_u *)s1);
        c2 = utf_ptr2char((char_u *)s2);

        if (c1 != c2)
            return c1 - c2;

        s1 += utf_ptr2len(s1);
        s2 += utf_ptr2len(s2);
        ++i;
    }

    return 0;
}

chrisbra · 2025-05-14T11:37:31Z

Makes sense. I'll include your version then

zeertzjq · 2025-05-14T23:07:19Z

I'm confused. How does STRNCMP() not handle multi-byte correctly and why is utf_strncmp() needed? And using this utf_strncmp() here is wrong, as lastpatlen is the number of bytes, not the number of characters.

girishji · 2025-05-15T06:04:57Z

I'm confused. How does STRNCMP() not handle multi-byte correctly and why is utf_strncmp() needed? And using this utf_strncmp() here is wrong, as lastpatlen is the number of bytes, not the number of characters.

I think I was not clear. I meant using utf_strncmp() inside mb_strncmp:

    int
mb_strncmp(char_u *s1, char_u *s2, size_t nn)
{
...
    if (enc_utf8)
    {
	return utf_strncmp(...);
    }
    else
    {
	return STRNCMP(...)
   }
    return 0;
}

I assumed non-utf8 multi-code chars are different from utf8 multi-code (from reading the comments) and that strncmp() will not work for them. I assumed that is why strncmp() was not used earlier. Let me verify this with some tests.

girishji · 2025-05-15T07:18:48Z

@zeertzjq

STRNCMP() and STRCMP() do not appear to handle multi-byte characters correctly. Here's an example:

STRNCMP("é", "é", 2)

This returns 0, indicating the strings are equal—but they are not. The first string is a single character "é" (U+00E9), while the second is composed of two code points: "e" (U+0065) followed by a combining acute accent (U+0301).

In digraph notation:

The first string is entered as Ctrl-K ' e
The second string as e Ctrl-V u 0301

Since STRNCMP() compares raw bytes, it happens that the first two bytes match—even though the second string consists of three bytes in total. STRCMP() also gives an incorrect result for similar reasons.

EDIT: strcmp()/strncmp() also does not handle DBCS (Shift-JIS, Big5) correctly

zeertzjq · 2025-05-15T07:47:40Z

STRNCMP() and STRCMP() do not appear to handle multi-byte characters correctly. Here's an example:
STRNCMP("é", "é", 2)

This case is not relevant here, since lastpatlen is the full length of lastpat, so it'll not compare only a part of a multibyte character.

STRCMP() also gives an incorrect result for similar reasons.

Example?

girishji · 2025-05-15T08:23:50Z

STRCMP() does handle multi-byte characters correctly—there was a bug in my test. The actual issue lies with STRNCMP(), which compares a fixed number of bytes and can misinterpret multi-byte sequences.

The correct solution seems to be to use STRCMP() or STRNCMP() (since lastpatlen will not split multibyte char).

Apologies for the earlier confusion.

chrisbra · 2025-05-15T16:29:50Z

That's why I initially used STRNCMP. Because even if it does not handle multi-byte chars correctly, it would cause the search indicator to be updated additionally. Okay, I'll revert it then. Thanks both

girishji · 2025-05-16T10:56:25Z

Thanks.

Problem: search_stat not reset when pattern differs in case (tahzibijafar) Solution: use STRNCMP instead of MB_STRNICMP macro There was a long standing todo comment, that using MB_STRNICMP is wrong. So let's change it to STRNCMP() instead. Even if it not handle multi-byte characters correctly, then Vim will rather recompute the search stat, instead of re-using the old (and possibly wrong) value. fixes: vim/vim#17312 closes: vim/vim#17314 vim/vim@670d0c1 Co-authored-by: Christian Brabandt <cb@256bit.org>

#34058) Problem: search_stat not reset when pattern differs in case (tahzibijafar) Solution: use STRNCMP instead of MB_STRNICMP macro There was a long standing todo comment, that using MB_STRNICMP is wrong. So let's change it to STRNCMP() instead. Even if it not handle multi-byte characters correctly, then Vim will rather recompute the search stat, instead of re-using the old (and possibly wrong) value. fixes: vim/vim#17312 closes: vim/vim#17314 vim/vim@670d0c1 Co-authored-by: Christian Brabandt <cb@256bit.org>

#34058) Problem: search_stat not reset when pattern differs in case (tahzibijafar) Solution: use STRNCMP instead of MB_STRNICMP macro There was a long standing todo comment, that using MB_STRNICMP is wrong. So let's change it to STRNCMP() instead. Even if it not handle multi-byte characters correctly, then Vim will rather recompute the search stat, instead of re-using the old (and possibly wrong) value. fixes: vim/vim#17312 closes: vim/vim#17314 vim/vim@670d0c1 Co-authored-by: Christian Brabandt <cb@256bit.org> (cherry picked from commit ec5f054)

chrisbra force-pushed the gh-17312 branch from 00a3779 to 1a54533 Compare May 15, 2025 18:12

chrisbra closed this in 670d0c1 May 16, 2025

zeertzjq mentioned this pull request May 16, 2025

vim-patch:9.1.1395: search_stat not reset when pattern differs in case neovim/neovim#34058

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Problem: search_stat not reset when pattern differs in case #17314

Problem: search_stat not reset when pattern differs in case #17314

chrisbra commented May 14, 2025

Uh oh!

girishji commented May 14, 2025

Uh oh!

chrisbra commented May 14, 2025

Uh oh!

zeertzjq commented May 14, 2025 •

edited

Loading

Uh oh!

girishji commented May 15, 2025

Uh oh!

girishji commented May 15, 2025 •

edited

Loading

Uh oh!

zeertzjq commented May 15, 2025

Uh oh!

girishji commented May 15, 2025 •

edited

Loading

Uh oh!

chrisbra commented May 15, 2025

Uh oh!

girishji commented May 16, 2025

Uh oh!

Uh oh!

Uh oh!

Problem: search_stat not reset when pattern differs in case #17314

Problem: search_stat not reset when pattern differs in case #17314

Conversation

chrisbra commented May 14, 2025

Uh oh!

girishji commented May 14, 2025

Uh oh!

chrisbra commented May 14, 2025

Uh oh!

zeertzjq commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

girishji commented May 15, 2025

Uh oh!

girishji commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zeertzjq commented May 15, 2025

Uh oh!

girishji commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrisbra commented May 15, 2025

Uh oh!

girishji commented May 16, 2025

Uh oh!

Uh oh!

zeertzjq commented May 14, 2025 •

edited

Loading

girishji commented May 15, 2025 •

edited

Loading

girishji commented May 15, 2025 •

edited

Loading