Skip to content

Incomplete matches when using the --word-regexp flag #2574

@ilia-cy

Description

@ilia-cy

What version of ripgrep are you using?

ripgrep 13.0.0
-SIMD -AVX (compiled)

How did you install ripgrep?

wget https://github.com/BurntSushi/ripgrep/releases/download/13.0.0/ripgrep-13.0.0-x86_64-unknown-linux-musl.tar.gz

What operating system are you using ripgrep on?

Mac and Linux

Describe your bug.

According to the manual:

-w, --word-regexp
            Only show matches surrounded by word boundaries. This is roughly equivalent to
            putting \b before and after all of the search patterns.

I'm using this text as a sample file:

some.domain.com
some.domain.com/x
some.domain.com

And here is some very naive regex that searches for "domains" (not really, but it is enough to show the problem):
"([\w]+[.])*domain[.](\w)+

Running this regex with the -w flag (rg -w "([\w]+[.])*domain[.](\w)+") matches the first and third strings properly (some.domain.com), but for the second one starts capturing from the second char onwards, meaning it matches ome.domain.com.

If I change the execution, remove the -w flag and wrap the regex with \b (rg "\b([\w]+[.])*domain[.](\w)+\b"), then all lines are matched properly (some.domain.com is matched).

What are the steps to reproduce the behavior?

Explained above.

What is the actual behavior?

https://gist.github.com/ilia-cy/396f43f57057e42723d4a3dc87d4e994
The matches aren't shown here because they are highlighted in the terminal, so i'm adding a screenshot:

image

What is the expected behavior?

I would expect that the flag usage would work similarly as wrapping the pattern with \b (as the manual indicates), so that all the some.domain.com instances will be matched.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugA bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions