Skip to content

[WTF] Integrate simdutf #9990

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

Constellation
Copy link
Member

@Constellation Constellation commented Feb 12, 2023

68eced0

[WTF] Integrate simdutf
https://bugs.webkit.org/show_bug.cgi?id=250112
rdar://104145576

Reviewed by Michael Saboff.

Add simdutf for faster base64 processing for new Base64 features in ECMAScript.
Once it implements replacement character handling, we can also consider using it for
UTF-8 / UTF-16 conversion too. Apple OSS Approval: OSS-13396

* Source/WTF/WTF.xcodeproj/project.pbxproj:
* Source/WTF/wtf/CMakeLists.txt:
* Source/WTF/wtf/simdutf/LICENSE-simdutf.txt: Added.
* Source/WTF/wtf/simdutf/simdutf.cpp: Added.
(_mm512_set_epi8):
(simdutf::implementation::supported_by_runtime_system const):
(simdutf::get_available_implementations):
(simdutf::get_active_implementation):
(simdutf::match_system):
(simdutf::to_string):
(simdutf::BOM::check_bom):
(simdutf::BOM::bom_byte_size):
(simdutf::result::result):
* Source/WTF/wtf/simdutf/simdutf.h: Added.
(simdutf::internal::detect_supported_architectures):
(simdutf::internal::cpuid):
(simdutf::implementation::name const):
(simdutf::implementation::description const):
(simdutf::implementation::required_instruction_sets const):
(simdutf::implementation::implementation):
(simdutf::internal::available_implementation_list::available_implementation_list):
(simdutf::internal::atomic_ptr::atomic_ptr):
(simdutf::internal::atomic_ptr::operator const T* const):
(simdutf::internal::atomic_ptr::operator* const):
(simdutf::internal::atomic_ptr::operator-> const):
(simdutf::internal::atomic_ptr::operator T*):
(simdutf::internal::atomic_ptr::operator*):
(simdutf::internal::atomic_ptr::operator->):
(simdutf::internal::atomic_ptr::operator=):

Canonical link: https://commits.webkit.org/281011@main

6664728

Misc iOS, visionOS, tvOS & watchOS macOS Linux Windows
❌ 🧪 style ✅ 🛠 ios ✅ 🛠 mac ✅ 🛠 wpe 🛠 wincairo
✅ 🛠 ios-sim ✅ 🛠 mac-AS-debug 🧪 wpe-wk2 🧪 wincairo-tests
✅ 🧪 webkitperl 🧪 ios-wk2 🧪 api-mac 🧪 api-wpe
🧪 ios-wk2-wpt 🧪 mac-wk1 ✅ 🛠 wpe-cairo
🛠 🧪 jsc 🧪 api-ios 🧪 mac-wk2 ✅ 🛠 gtk
✅ 🛠 🧪 jsc-arm64 ✅ 🛠 vision 🧪 mac-AS-debug-wk2 🧪 gtk-wk2
✅ 🛠 vision-sim ✅ 🧪 mac-wk2-stress ✅ 🧪 api-gtk
✅ 🧪 vision-wk2 ✅ 🛠 jsc-armv7
✅ 🛠 🧪 unsafe-merge 🛠 tv 🧪 jsc-armv7-tests
🛠 tv-sim
🛠 watch
🛠 watch-sim

@Constellation Constellation self-assigned this Feb 12, 2023
@Constellation Constellation added the JavaScriptCore For bugs in JavaScriptCore, the JS engine used by WebKit, other than kxmlcore issues. label Feb 12, 2023
@Constellation Constellation force-pushed the eng/Possible-8x-perf-improvement-to-UTF8---UTF16-text-encoding branch from 3b63f52 to 8c6f873 Compare February 12, 2023 01:38
@Constellation Constellation changed the title Possible 8x perf improvement to UTF8 -> UTF16 text encoding [WTF] Integrate simdutf Feb 12, 2023
@Constellation Constellation force-pushed the eng/Possible-8x-perf-improvement-to-UTF8---UTF16-text-encoding branch from 8c6f873 to 2eade91 Compare February 12, 2023 04:58
@litherum
Copy link
Contributor

litherum commented Feb 12, 2023

The transformation from 8-bit to 16-bit strings isn't very complicated; I'm surprised we couldn't just write the routine in vector assembly ourself. Is this really worth a 3rd party dependency? Maybe I'm misunderstanding the motivation?

@Constellation
Copy link
Member Author

Constellation commented Feb 12, 2023

Making it the fastest is challenging, https://arxiv.org/pdf/2109.10433.pdf, and so far this is successfully making it super fast in all the architectures we care.

@Constellation Constellation force-pushed the eng/Possible-8x-perf-improvement-to-UTF8---UTF16-text-encoding branch from 2eade91 to e291c42 Compare February 12, 2023 06:08
@Constellation Constellation force-pushed the eng/Possible-8x-perf-improvement-to-UTF8---UTF16-text-encoding branch from e291c42 to 010daa5 Compare February 12, 2023 06:18
@Constellation Constellation force-pushed the eng/Possible-8x-perf-improvement-to-UTF8---UTF16-text-encoding branch from 010daa5 to 37450d0 Compare February 12, 2023 07:48
@lemire
Copy link

lemire commented Apr 4, 2023

@Constellation I recommend bumping the version. See https://github.com/simdutf/simdutf/releases

The transformation from 8-bit to 16-bit strings isn't very complicated; I'm surprised we couldn't just write the routine in vector assembly ourself. Is this really worth a 3rd party dependency? Maybe I'm misunderstanding the motivation?

You definitively can do that (write the routine in vector assembly ourself). And some systems did just that... like Oracle GraalVM. However, the simdutf library has been running in production for some time (part of node.js and bun). We have thorough testing across a wide range of platforms, compilers and so forth. It is also being actively developed. I think @Jarred-Sumner can testify that it was beneficial for bun to adopt simdutf.

Our research on the topic:

We shall have more in the future.

@lemire
Copy link

lemire commented May 16, 2023

The adoption of the simdutf library by the popular Node.js JavaScript runtime lead to a significant
performance gain:

Decoding and Encoding becomes considerably faster than in Node.js 18. With the addition of simdutf for UTF-8 parsing the observed benchmark, results improved by 364% (an extremely impressive leap) when decoding in comparison to Node.js 16. (State of Node.js Performance 2023)

@Constellation Constellation force-pushed the eng/Possible-8x-perf-improvement-to-UTF8---UTF16-text-encoding branch from 37450d0 to 94a723b Compare July 4, 2024 03:31
@Constellation Constellation force-pushed the eng/Possible-8x-perf-improvement-to-UTF8---UTF16-text-encoding branch from 94a723b to 630ec5a Compare July 4, 2024 03:35
@webkit-ews-buildbot webkit-ews-buildbot added the merging-blocked Applied to prevent a change from being merged label Jul 4, 2024
@Constellation Constellation removed the merging-blocked Applied to prevent a change from being merged label Jul 4, 2024
@Constellation Constellation force-pushed the eng/Possible-8x-perf-improvement-to-UTF8---UTF16-text-encoding branch from 630ec5a to b0d4a0e Compare July 4, 2024 03:47
@webkit-ews-buildbot webkit-ews-buildbot added the merging-blocked Applied to prevent a change from being merged label Jul 4, 2024
@Constellation Constellation removed the merging-blocked Applied to prevent a change from being merged label Jul 4, 2024
@Constellation Constellation force-pushed the eng/Possible-8x-perf-improvement-to-UTF8---UTF16-text-encoding branch from b0d4a0e to 78bec03 Compare July 4, 2024 04:03
@webkit-ews-buildbot webkit-ews-buildbot added the merging-blocked Applied to prevent a change from being merged label Jul 16, 2024
@Constellation Constellation removed the merging-blocked Applied to prevent a change from being merged label Jul 16, 2024
@Constellation Constellation force-pushed the eng/Possible-8x-perf-improvement-to-UTF8---UTF16-text-encoding branch from da09830 to 326e102 Compare July 16, 2024 16:06
@webkit-ews-buildbot webkit-ews-buildbot added the merging-blocked Applied to prevent a change from being merged label Jul 16, 2024
@Constellation Constellation removed the merging-blocked Applied to prevent a change from being merged label Jul 16, 2024
@Constellation Constellation force-pushed the eng/Possible-8x-perf-improvement-to-UTF8---UTF16-text-encoding branch from 326e102 to 8ed317b Compare July 16, 2024 17:17
@webkit-ews-buildbot webkit-ews-buildbot added the merging-blocked Applied to prevent a change from being merged label Jul 16, 2024
@Constellation Constellation removed the merging-blocked Applied to prevent a change from being merged label Jul 16, 2024
@Constellation Constellation force-pushed the eng/Possible-8x-perf-improvement-to-UTF8---UTF16-text-encoding branch from 8ed317b to ce9b189 Compare July 16, 2024 17:44
@webkit-ews-buildbot webkit-ews-buildbot added the merging-blocked Applied to prevent a change from being merged label Jul 16, 2024
@Constellation Constellation removed the merging-blocked Applied to prevent a change from being merged label Jul 16, 2024
@Constellation Constellation force-pushed the eng/Possible-8x-perf-improvement-to-UTF8---UTF16-text-encoding branch from ce9b189 to b514250 Compare July 16, 2024 18:24
@Constellation Constellation marked this pull request as ready for review July 16, 2024 18:25
@Constellation Constellation requested a review from a team July 16, 2024 18:25
@Constellation
Copy link
Member Author

Is there a reason that ICU's conversion routines must be 364% slower than necessary?

It is largely depending on ICU's implementation.

@Constellation Constellation force-pushed the eng/Possible-8x-perf-improvement-to-UTF8---UTF16-text-encoding branch from b514250 to 6664728 Compare July 16, 2024 18:27
Copy link
Contributor

@msaboff msaboff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rs=me

@Constellation Constellation added the unsafe-merge-queue Applied to send a pull request to merge-queue, but skip building and testing label Jul 16, 2024
https://bugs.webkit.org/show_bug.cgi?id=250112
rdar://104145576

Reviewed by Michael Saboff.

Add simdutf for faster base64 processing for new Base64 features in ECMAScript.
Once it implements replacement character handling, we can also consider using it for
UTF-8 / UTF-16 conversion too. Apple OSS Approval: OSS-13396

* Source/WTF/WTF.xcodeproj/project.pbxproj:
* Source/WTF/wtf/CMakeLists.txt:
* Source/WTF/wtf/simdutf/LICENSE-simdutf.txt: Added.
* Source/WTF/wtf/simdutf/simdutf.cpp: Added.
(_mm512_set_epi8):
(simdutf::implementation::supported_by_runtime_system const):
(simdutf::get_available_implementations):
(simdutf::get_active_implementation):
(simdutf::match_system):
(simdutf::to_string):
(simdutf::BOM::check_bom):
(simdutf::BOM::bom_byte_size):
(simdutf::result::result):
* Source/WTF/wtf/simdutf/simdutf.h: Added.
(simdutf::internal::detect_supported_architectures):
(simdutf::internal::cpuid):
(simdutf::implementation::name const):
(simdutf::implementation::description const):
(simdutf::implementation::required_instruction_sets const):
(simdutf::implementation::implementation):
(simdutf::internal::available_implementation_list::available_implementation_list):
(simdutf::internal::atomic_ptr::atomic_ptr):
(simdutf::internal::atomic_ptr::operator const T* const):
(simdutf::internal::atomic_ptr::operator* const):
(simdutf::internal::atomic_ptr::operator-> const):
(simdutf::internal::atomic_ptr::operator T*):
(simdutf::internal::atomic_ptr::operator*):
(simdutf::internal::atomic_ptr::operator->):
(simdutf::internal::atomic_ptr::operator=):

Canonical link: https://commits.webkit.org/281011@main
@webkit-commit-queue webkit-commit-queue force-pushed the eng/Possible-8x-perf-improvement-to-UTF8---UTF16-text-encoding branch from 6664728 to 68eced0 Compare July 16, 2024 19:01
@webkit-commit-queue
Copy link
Collaborator

Committed 281011@main (68eced0): https://commits.webkit.org/281011@main

Reviewed commits have been landed. Closing PR #9990 and removing active labels.

@webkit-commit-queue webkit-commit-queue merged commit 68eced0 into WebKit:main Jul 16, 2024
@webkit-commit-queue webkit-commit-queue removed the unsafe-merge-queue Applied to send a pull request to merge-queue, but skip building and testing label Jul 16, 2024
@Constellation Constellation deleted the eng/Possible-8x-perf-improvement-to-UTF8---UTF16-text-encoding branch July 17, 2024 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
JavaScriptCore For bugs in JavaScriptCore, the JS engine used by WebKit, other than kxmlcore issues.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants