Improve `isIdentifierName` performance #13211

JLHwung · 2021-04-26T18:53:42Z

Q	A
Any Dependency Changes?	Added benchmark to `devDependencies`
License	MIT

Results from my local machine

$ node packages/babel-helper-validator-identifier/benchmark/isIdentifierName.bench.mjs

baseline#isIdentifierName on 2 short ASCII words: 2819979 ops/sec ±1.14% (0ms)
baselineisIdentifierName on 1 long ASCII words: 1486317 ops/sec ±0.4% (0.001ms)
baselineisIdentifierName on 3 non-ASCII words: 250140 ops/sec ±0.38% (0.004ms)
current#isIdentifierName on 2 short ASCII words: 7775830 ops/sec ±0.38% (0ms)
currentisIdentifierName on 1 long ASCII words: 4291478 ops/sec ±0.33% (0ms)
currentisIdentifierName on 3 non-ASCII words: 245762 ops/sec ±0.53% (0.004ms)

Initially meant to test per-package benchmark setup, this PR sees up to 3x improvement for ASCII words. The function is not used by @babel/parser internally, instead it is exported via @babel/types and used in systemjs and JSX transforms.

This PR also adds more test cases to isIdentifier*.

charCodeAt is faster for BMP characters and most identifier names are ASCII.

codesandbox-ci · 2021-04-26T18:57:36Z

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

Latest deployment of this branch, based on commit c2c9660:

Sandbox	Source
babel-repl-custom-plugin	Configuration
babel-plugin-multi-config	Configuration

babel-bot · 2021-04-26T18:58:40Z

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/45633/

packages/babel-helper-validator-identifier/src/identifier.ts

nicolo-ribaudo · 2021-04-26T20:37:15Z

packages/babel-helper-validator-identifier/src/identifier.ts

+    let cp = name.charCodeAt(i);
+    if ((cp & 0xfc00) === 0xd800 && i + 1 < name.length) {
+      const trail = name.charCodeAt(++i);
+      cp = 0x10000 + ((cp & 0x3ff) << 10) + (trail & 0x3ff);
+    }


Is manually reimplementing codePointAt(i) faster than using the native one? 🤔

Or is the speedup done just by not using for-of+Array.from?

I think both removing Array.from and charCodeAt play a role here. I didn't test for-of since when we get rid of Array.from we have to use for.

According to this benchmark, both codePointAt and this polyfill runs as fast on non-ASCII words. But when it comes to ASCII words, charCodeAt one is significantly faster.

# Fork two processes so they can get optimized independently # `isIdentifierName2` is the `codePointAt` version) (node --predictable ./benchmark/isIdentifierName1.bench.mjs )& (node --predictable ./benchmark/isIdentifierName2.bench.mjs )&

current#isIdentifierName on 2 short ASCII words: 11496362 ops/sec ±2.63% (0ms) current#isIdentifierName2 on 2 short ASCII words: 6652795 ops/sec ±1.6% (0ms) current#isIdentifierName2 on 1 long ASCII words: 3236276 ops/sec ±0.64% (0ms) current#isIdentifierName on 1 long ASCII words: 4462584 ops/sec ±0.54% (0ms) current#isIdentifierName on 3 non-ASCII words: 280274 ops/sec ±0.36% (0.004ms) current#isIdentifierName2 on 3 non-ASCII words: 287687 ops/sec ±0.35% (0.003ms)

It surprised me too. I think I should dive into the machine code and see what's happening for ASCII words.

Although V8 uses charCodeAt for codePointAt (code), codePointAt is not inlined when compiled via TurboFan as we can see there is a call instruction to the StringCodePointAt

-- B5 start (in loop 4) -- 0x38d6008c40de 9e 48897dd8 REX.W movq [rbp-0x28],rdi 0x38d6008c40e2 a2 33f6 xorl rsi,rsi 0x38d6008c40e4 a4 488bdf REX.W movq rbx,rdi [ Inlined Trampoline to StringCodePointAt 0x38d6008c40e7 a7 e894237aff call 0x38d600066480 (StringCodePointAt) ;; runtime entry ] 0x38d6008c40ec ac 837dd000 cmpl [rbp-0x30],0x0 0x38d6008c40f0 b0 0f854d000000 jnz 0x38d6008c4143 <+0x103>

A single call instruction does not necessarily slow down the whole routine, but to issue a call we have to prepare the function prologues and epilogues which is the runtime overhead of calling functions.

When we are using charCodeAt, it is inlined so the runtime overhead can be stripped. I don't know why TurboFan does not inline StringCodePointAt, maybe because charCodeAt is implemented in CSA as a primitive used in StringCodePointAt: https://source.chromium.org/chromium/chromium/src/+/master:v8/src/codegen/code-stub-assembler.cc;drc=221e331b49dfefadbc6fa40b0c68e6f97606d0b3;l=6822

I guess the performance difference here is due to not inlining codePointAt. But I can be wrong as I am not very familiar with this topic.

/cc @bmeurer

Oh awesome investigation! (btw, according to its bio bmeurer doesn't work on v8 anymore)

existentialism

🏃‍♂️

bmeurer · 2021-04-28T04:49:33Z

I'm pretty sure the major performance difference is due to not using Array.from. Beyond that, I think it'd be good to file a feature request for V8 to fix the performance cliff with codePointAt in TurboFan.

jridgewell · 2021-05-06T02:22:32Z

@JLHwung did you open an v8 issue about codePointAt's perf cliff?

JLHwung · 2021-05-06T18:42:03Z

For reference I open a V8 issue: https://bugs.chromium.org/p/v8/issues/detail?id=11743

jridgewell · 2021-05-06T18:43:21Z

Thanks!

JLHwung added 6 commits April 26, 2021 14:32

chore: migrate to @unicode/unicode-*

a0ae9b3

test: add more test cases

774de15

test: add benchmarks

3d2b73e

perf: avoid extra Array.from conversion

bc8f224

perf: avoid codePointAt

317da32

charCodeAt is faster for BMP characters and most identifier names are ASCII.

chore: simplify benchmarks

ea8ebb8

JLHwung added pkg: types PR: Performance 🏃‍♀️ A type of pull request used for our changelog categories labels Apr 26, 2021

jridgewell approved these changes Apr 26, 2021

View reviewed changes

packages/babel-helper-validator-identifier/src/identifier.ts Outdated Show resolved Hide resolved

address code review

3527f38

nicolo-ribaudo reviewed Apr 26, 2021

View reviewed changes

existentialism approved these changes Apr 27, 2021

View reviewed changes

nicolo-ribaudo approved these changes Apr 27, 2021

View reviewed changes

fix: \ud800\ud800 is not an identifier name

c2c9660

nicolo-ribaudo approved these changes Apr 27, 2021

View reviewed changes

nicolo-ribaudo merged commit 4753768 into babel:main Apr 27, 2021

nicolo-ribaudo deleted the faster-is-identifier-name branch April 27, 2021 21:04

JLHwung mentioned this pull request May 6, 2021

Refactor private name tokenizing #13256

Merged

github-actions bot added the outdated A closed issue/PR that is archived due to age. Recommended to make a new issue label Aug 6, 2021

github-actions bot locked as resolved and limited conversation to collaborators Aug 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve `isIdentifierName` performance #13211

Improve `isIdentifierName` performance #13211

Uh oh!

JLHwung commented Apr 26, 2021

Uh oh!

codesandbox-ci bot commented Apr 26, 2021 •

edited

Loading

Uh oh!

babel-bot commented Apr 26, 2021 •

edited

Loading

Uh oh!

Uh oh!

nicolo-ribaudo Apr 26, 2021 •

edited

Loading

Uh oh!

JLHwung Apr 27, 2021

Uh oh!

JLHwung Apr 27, 2021

Uh oh!

nicolo-ribaudo Apr 27, 2021

Uh oh!

existentialism left a comment

Uh oh!

bmeurer commented Apr 28, 2021

Uh oh!

jridgewell commented May 6, 2021

Uh oh!

JLHwung commented May 6, 2021

Uh oh!

jridgewell commented May 6, 2021

Uh oh!

Uh oh!

Uh oh!

Improve isIdentifierName performance #13211

Improve isIdentifierName performance #13211

Uh oh!

Conversation

JLHwung commented Apr 26, 2021

Uh oh!

codesandbox-ci bot commented Apr 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

babel-bot commented Apr 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

nicolo-ribaudo Apr 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JLHwung Apr 27, 2021

Choose a reason for hiding this comment

Uh oh!

JLHwung Apr 27, 2021

Choose a reason for hiding this comment

Uh oh!

nicolo-ribaudo Apr 27, 2021

Choose a reason for hiding this comment

Uh oh!

existentialism left a comment

Choose a reason for hiding this comment

Uh oh!

bmeurer commented Apr 28, 2021

Uh oh!

jridgewell commented May 6, 2021

Uh oh!

JLHwung commented May 6, 2021

Uh oh!

jridgewell commented May 6, 2021

Uh oh!

Uh oh!

Improve `isIdentifierName` performance #13211

Improve `isIdentifierName` performance #13211

codesandbox-ci bot commented Apr 26, 2021 •

edited

Loading

babel-bot commented Apr 26, 2021 •

edited

Loading

nicolo-ribaudo Apr 26, 2021 •

edited

Loading