-
-
Notifications
You must be signed in to change notification settings - Fork 415
feat: remove unnecessary character class from 933151 #4135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📊 Quantitative test results for language: |
I'm curious why you think this would have an impact on performance. Did you measure any difference? |
I'm not sure it's measurable, I mean the difference is not significant at all. I tried that with msc_retest on these ways:
Please note that there is no Then here are the commands:
These runs test with v2's and v3's regex engine with old and new patterns against the mentioned rule's 1st regression test. In case of v2 the regex engine (PCRE2 is the default) uses JIT. v3's regex code uses JIT by default. The results are below: v2/old pattern
v2/new pattern
v3/old pattern
v3/new pattern
I just leave this comment here to help anyone who wants to compare the regex patterns' performance, use msc_retest. |
Hey, apologies i have a couple more prs coming in relation to the same regex (and 933151), which overall improves our performance between 48 to 112 times (i dont think the difference will be as pronounced for you guys, especially with pcre, however in rust regex not extracting from capture groups and just matching increases performance massively). |
Interesting, thanks for the explanation. Although I would have expected the engine to automatically optimise |
For the regex used in 933151, the regex in the first chained rule contains a character class at the end:
\b([^\s]+)\s*[(]
However, removing the character class is better for performance (and arguably more readable):
\b([^\s]+)\s*\(