feat: remove unnecessary character class from 933151 #4135

TimDiam0nd · 2025-05-16T20:54:17Z

For the regex used in 933151, the regex in the first chained rule contains a character class at the end:
\b([^\s]+)\s*[(]
However, removing the character class is better for performance (and arguably more readable):
\b([^\s]+)\s*\(

github-actions · 2025-05-16T20:54:48Z

📊 Quantitative test results for language: eng, year: 2023, size: 10K, paranoia level: 1:
🚀 Quantitative testing did not detect new false positives

theseion · 2025-05-17T05:59:36Z

I'm curious why you think this would have an impact on performance. Did you measure any difference?

airween · 2025-05-17T07:48:49Z

I'm curious why you think this would have an impact on performance. Did you measure any difference?

I'm not sure it's measurable, I mean the difference is not significant at all. I tried that with msc_retest on these ways:

$ cat r933151_old.txt 
\b([^\s]+)\s*[(]

$ cat r933151_new.txt 
\b([^\s]+)\s*\(

Please note that there is no EOL at the end of lines!

Then here are the commands:

echo -n "array_diff (" | src/pcre4msc2 -j -n 100000 r933151_old.txt 
echo -n "array_diff (" | src/pcre4msc2 -j -n 100000 r933151_new.txt 
echo -n "array_diff (" | src/pcre4msc3 -n 100000 r933151_old.txt 
echo -n "array_diff (" | src/pcre4msc3 -n 100000 r933151_new.txt

These runs test with v2's and v3's regex engine with old and new patterns against the mentioned rule's 1st regression test. In case of v2 the regex engine (PCRE2 is the default) uses JIT. v3's regex code uses JIT by default.

The results are below:

v2/old pattern

Num of values: 100000
         Mean: 000.000000352
       Median: 000.000000324
          Min: 000.000000171
          Max: 000.000132326
        Range: 000.000132155
Std deviation: 000.000000519

v2/new pattern

Num of values: 100000
         Mean: 000.000000435
       Median: 000.000000347
          Min: 000.000000169
          Max: 000.000058875
        Range: 000.000058706
Std deviation: 000.000000507

v3/old pattern

Num of values: 100000
         Mean: 000.000000230
       Median: 000.000000218
          Min: 000.000000111
          Max: 000.000051658
        Range: 000.000051547
Std deviation: 000.000000281

v3/new pattern

Num of values: 100000
         Mean: 000.000000227
       Median: 000.000000213
          Min: 000.000000112
          Max: 000.000026295
        Range: 000.000026183
Std deviation: 000.000000194

I just leave this comment here to help anyone who wants to compare the regex patterns' performance, use msc_retest.

TimDiam0nd · 2025-05-17T12:17:08Z

Hey, apologies i have a couple more prs coming in relation to the same regex (and 933151), which overall improves our performance between 48 to 112 times (i dont think the difference will be as pronounced for you guys, especially with pcre, however in rust regex not extracting from capture groups and just matching increases performance massively).
Specifically just this change improved our regex compile times very slightly, though not a noticeable amount.

theseion · 2025-05-17T12:56:44Z

Interesting, thanks for the explanation. Although I would have expected the engine to automatically optimise [(] to \( ;)

remove unnecessary character class for performance

7f8f633

EsadCetiner previously approved these changes May 16, 2025

View reviewed changes

TimDiam0nd dismissed EsadCetiner’s stale review via 14a9b94 May 17, 2025 12:05

TimDiam0nd force-pushed the main branch from 14a9b94 to 7f8f633 Compare May 17, 2025 12:05

theseion approved these changes May 17, 2025

View reviewed changes

theseion added the release:refactor label May 17, 2025

theseion added this pull request to the merge queue May 17, 2025

Merged via the queue into coreruleset:main with commit 46f1aae May 17, 2025
12 checks passed

fzipi mentioned this pull request Jun 2, 2025

Monthly Chat Agenda June 2025-06-02 #4153

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: remove unnecessary character class from 933151 #4135

feat: remove unnecessary character class from 933151 #4135

Uh oh!

TimDiam0nd commented May 16, 2025

Uh oh!

github-actions bot commented May 16, 2025 •

edited

Loading

Uh oh!

theseion commented May 17, 2025

Uh oh!

airween commented May 17, 2025

Uh oh!

TimDiam0nd commented May 17, 2025

Uh oh!

theseion commented May 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat: remove unnecessary character class from 933151 #4135

feat: remove unnecessary character class from 933151 #4135

Uh oh!

Conversation

TimDiam0nd commented May 16, 2025

Uh oh!

github-actions bot commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

theseion commented May 17, 2025

Uh oh!

airween commented May 17, 2025

Uh oh!

TimDiam0nd commented May 17, 2025

Uh oh!

theseion commented May 17, 2025

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented May 16, 2025 •

edited

Loading