Skip to content

Conversation

GalacticHypernova
Copy link
Contributor

@GalacticHypernova GalacticHypernova commented Oct 20, 2024

🔗 Linked issue

📚 Description

Extracting regex patterns avoid the need to recompile them on each use, especially in loops or hot code paths.

This, in return, will provide runtime performance improvements with faster-executing code, and memory efficiency improvements with less object creation overhead and GC pressure with inline usage.

Benchmark (taken from Pooya's example, each part ran separately to prevent any sort of inline cache from altering the performance results):

const version = "1.0.0-1.0a"
console.time("inline")
for (let i=0;i<100000;i++){
  version.replace(/-\d+\.[0-9a-f]+/, '')
}
console.timeEnd("inline")

const SEMANTIC_VERSION_RE = /-\d+\.[0-9a-f]+/
console.time("external")
for (let i=0;i<100000;i++){
  version.replace(SEMANTIC_VERSION_RE, '')
}
console.timeEnd("external")

Results:

Browser:

image
image

Node:

image
image

It also shows a 50% decrease in memory usage in this simple benchmark alone, and has been tested using a memory profiler:

image
image

Copy link

Review PR in StackBlitz Codeflow Run & review this pull request in StackBlitz Codeflow.

@GalacticHypernova GalacticHypernova marked this pull request as ready for review October 20, 2024 19:24
@GalacticHypernova
Copy link
Contributor Author

GalacticHypernova commented Oct 20, 2024

I also followed up on it in the warning itself, but I believe the CodeQL is inaccurate. The pattern is the exact same, the only difference is the placement.

And given that marking this as ready for review inherently notifies you, I decided not to mention you additionally.

I assume you would like ui-templates to be left unchanged? I didn't make any changes there yet because I remember from my other perf PR that you decided it is not a concern

@GalacticHypernova GalacticHypernova changed the title perf: cache regex patterns perf: extract regex patterns Oct 20, 2024
@pi0
Copy link
Member

pi0 commented Oct 21, 2024

import { run, bench } from 'mitata'

// Version 1: regex inline
function regexInline(version) {
  return version.replace(/-\d+\.[0-9a-f]+/, '')
}

// Version 2: regex outside
const SEMANTIC_VERSION_RE = /-\d+\.[0-9a-f]+/
function regexOutside(version) {
  return version.replace(SEMANTIC_VERSION_RE, '')
}

// Test
if (regexInline('1.0.0-1.0a') !== '1.0.0') throw new Error('1 failed')
if (regexOutside('1.0.0-1.0a') !== '1.0.0') throw new Error('2 failed')

bench('regex inline', () => {
  regexInline('1.0.0-1.0a')
})

bench('regex outside', () => {
  regexOutside('1.0.0-1.0a')
})

run()
image

@GalacticHypernova
Copy link
Contributor Author

GalacticHypernova commented Oct 21, 2024

Hey, Pooya!

The benchmark you made wouldn't show any noticeable improvements because it's a single iteration, and that wasn't the point of this PR. You can see that even in this PR I left many inline patterns untouched because it only compiles once, and therefore performance there won't be increased.

However, this PR is for loops/hot code paths (like most of these patterns are) as that's where the performance will really improve. It is also logical, because each iteration will encounter the same reference to the pattern as opposed to a brand new pattern to be recompiled

I will be able to provide a more thorough benchmark soon

Edit: I have provided a reproduction based on your example. It's also expected that given this is a fairly simple pattern, its relative performance improvements would be less than that of more complicated patterns, like some of those I covered in the PR, especially when the loop uses multiple regex patterns (like the route key which has 3 patterns per iteration)

@danielroe
Copy link
Member

@GalacticHypernova On the benchmark with 100,000 iterations, the performance improvement is 1.1-1.5ms. That seems negligible, and possibly within the margin of error of any test.

It's so tricky - with tiny improvements like this on the one hand there's no real downside, on the other I wouldn't be surprised if the engine could optimise this kind of thing itself - and will we rewrite things back in three months if/when it does? Also, are there any readability costs here?

@GalacticHypernova
Copy link
Contributor Author

GalacticHypernova commented Oct 21, 2024

@danielroe Allow me to begin by saying that the margin of error is the .4 ms difference. The benchmark was ran multiple times and the improvement was consistent.

This specific benchmark indeed shows a relative small improvement, but it's important to remember it's a simple benchmark with a simple pattern. In real world cases, a lot more is happening in parallel with much more complex patterns, which will slow everything down.

Consider for example the build process. Nuxt builds do way more than just running regex, and use way more complicated patterns with capturing groups, lookaheads/lookbehinds, atomic groups, and more.

The readability here is also not overly impacted, because the change is not that big, it's just pre-compiling the regex, assigning it to a variable, and then using the variable in stead of the inline pattern.

Now, can the engine optimize it on its own? Unfortunately, it can't, and that's why this PR is essential for resource-limited environments:
When the JS compiler reaches a regex literal, it compiles it. Given that RegExp is an Object, precompiling it into a variable assigns it the reference, not the value. Replacing a pattern with a variable, if it's in a loop, will reuse the same instance of the already-compiled regex. That way, the engine doesn't waste time on re-compiling it, and doesn't waste resources on local objects and GC cycles, providing better performance and memory efficiency. It could maybe cache it, but it won't be as reliable as a variable holding the reference, and memory efficiency is as important as runtime performance in environments with constrained resources.

I decided to test the actual memory differences:

image
image

This was taken multiple times (as always) with a memory profiler in the Devtools of Microsoft Edge, so the results are reliable given that they are consistent. This includes objects discarded both by Major and Minor GC, which can be selected in the Devtools' Memory section.

You might be wondering why I didn't use the performance/process memory API's to benchmark memory usage, and the answer to that is simple:
Unlike runtime performance, making memory benchmarks is considerably less straightforward because using process.memoryUsage().heapUsed has 2 major limitations, due to which it would be wrong to benchmark memory with code:

  1. Before + after the loop with subtraction - does not account for possible GC operations throughout the loop that could free up space from the inline pattern. This would essentially check if the end memory footprint is reasonable, but doesn't check the memory efficiency of the loop itself.
  2. Inside the loop to sum total memory usage - the sole act of accessing the heap stats during each iteration could introduce transient effects on memory, which would incorrectly be added to the total memory usage stats, even though it is not related to the loop itself. This, in result, would return incorrect memory usage insights, being considerably higher than what they would be without the addition of the measurement code

The results of the memory profiling indicate that there is over 50% decrease in memory usage when using the variable approach, from 5MB to 2MB, and that's only in this simple benchmark. On weaker devices or outdated browsers that use less powerful engines, this is especially significant, but it would also very likely be noticeable on stronger devices.

Is there anything you're particularly concerned about that you would like me to address?

P.S. I hope this wasn't too much text. I wanted to explain it in as much detail as I could to highlight the importance of such changes, which don't end solely on runtime performance, but also have a huge impact on the memory consumption, which is more often than not the main bottleneck in resource-constrained environments and a hugely neglected aspect in performance. In addition, these seemingly small changes could quickly accumulate in real world use cases when at scale (for example, in performance-critical apps), providing more benefit than a simple, targeted benchmark could ever demonstrate.

@danielroe danielroe merged commit 5dad6e6 into nuxt:main Oct 22, 2024
51 checks passed
@danielroe
Copy link
Member

let's give it a go!

@github-actions github-actions bot mentioned this pull request Oct 22, 2024
@GalacticHypernova
Copy link
Contributor Author

Thanks! I'm sure this will have a great impact

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants