-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Support using V8 as the Wasm JIT interpreter #6429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is a partial revert of halide#5097. It brings back a bunch of the code in WasmExecutor to set up and use V8 to run Wasm code. All of the code is copy-pasted. There are some small cleanups to move common code (like BDMalloc, structs, asserts) to a common area guarded by `if WITH_WABT || WITH_V8`. Enabling V8 requires setting 2 CMake options: - V8_INCLUDE_PATH - V8_LIB_PATH The first is a path to v8 include folder, to find headers, the second is the monolithic v8 library. This is because it's pretty difficult to build v8, and there are various flags you can set. Comments around those options provide some instructions for building v8. By default, we still use the wabt for running Wasm code, but we can use V8 by setting WITH_WABT=OFF WITH_V8=ON. Maybe in the future, with more testing, we can flip this. Right now this requires a locally patched build of V8 due to https://crbug.com/v8/10461, but once that is resolved, the version of V8 that includes the fix will be fine. Also enable a single test, block_transpose, to run on V8, with these results: $ HL_JIT_TARGET=wasm-32-wasmrt-wasm_simd128 \ ./test/performance/performance_block_transpose Dummy Func version: Scalar transpose bandwidth 3.45061e+08 byte/s. Wrapper version: Scalar transpose bandwidth 3.38931e+08 byte/s. Dummy Func version: Transpose vectorized in y bandwidth 6.74143e+08 byte/s. Wrapper version: Transpose vectorized in y bandwidth 3.54331e+08 byte/s. Dummy Func version: Transpose vectorized in x bandwidth 3.50053e+08 byte/s. Wrapper version: Transpose vectorized in x bandwidth 6.73421e+08 byte/s. Success! For comparison, when targeting host: $ ./test/performance/performance_block_transpose Dummy Func version: Scalar transpose bandwidth 1.33689e+09 byte/s. Wrapper version: Scalar transpose bandwidth 1.33583e+09 byte/s. Dummy Func version: Transpose vectorized in y bandwidth 2.20278e+09 byte/s. Wrapper version: Transpose vectorized in y bandwidth 1.45959e+09 byte/s. Dummy Func version: Transpose vectorized in x bandwidth 1.45921e+09 byte/s. Wrapper version: Transpose vectorized in x bandwidth 2.21746e+09 byte/s. Success! For comparison, running with wabt: Dummy Func version: Scalar transpose bandwidth 828715 byte/s. Wrapper version: Scalar transpose bandwidth 826204 byte/s. Dummy Func version: Transpose vectorized in y bandwidth 1.12008e+06 byte/s. Wrapper version: Transpose vectorized in y bandwidth 874958 byte/s. Dummy Func version: Transpose vectorized in x bandwidth 879031 byte/s. Wrapper version: Transpose vectorized in x bandwidth 1.10525e+06 byte/s. Success!
@steven-johnson fyi, this is probably not ready to merge, given that there is no version of V8 that can work yet (until https://crbug.com/v8/10461 is resolved), but wanted to upload for you to take a look first. |
This looks generally promising. Agreed that it's not ready to merge until the bug above is fixed. A few general comments:
|
Woops, forgot about this, added some documentation to README_webassembly, it also contains pointers to v8.dev, which should be the most accurate one.
Added a note about OS support too.
You might be thinking of this https://github.com/WebAssembly/wasm-c-api there hasn't been much progress there. V8 does support that api, but it doesn't work well if you need to work with the rest of the V8 C++ API.
Thanks, done! |
https://bugs.chromium.org/p/v8/issues/detail?id=10461#c10 V8 9.8.76 has the compile API we need. |
Cool! Let me know when this is ready for a re-review. |
This is ready, ptal. |
OK, I will get to it sometime this week. |
Hey, I still haven't gotten to this (various crises + required long meetings, etc), but we also need to touch base on issues of integrating this inside Google; in the past, there have been versioning issues (the version(s) of V8 we needed weren't available on a timely basis in google3). Let's discuss this via Chat (rather than here on GitHub) etc at your convenience to be sure this is going to be tractable there as well before moving forward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from the documentation nits below this generally looks good. There are some new failures in the tests that need addressing before we can land this, however:
$ ctest -L correctness -j72
... much output ...
The following tests FAILED:
53 - correctness_bounds_of_split (ILLEGAL)
107 - correctness_exception (Child aborted)
110 - correctness_extern_consumer (Child aborted)
112 - correctness_extern_error (Child aborted)
366 - correctness_vector_reductions (ILLEGAL)
These all pass under WABT -- that said, it may well be that these are things that got 'fixed' after V8 was deprecated and WABT added, not sure.
@@ -80,6 +80,44 @@ environment and are disabled). | |||
In sum: don't plan on using Halide JIT mode with Wasm unless you are working on | |||
the Halide library itself. | |||
|
|||
## Using V8 as the interpreter | |||
|
|||
There is experimental support for using V8 as the interpreter, it links in V8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Awkward wording, how about "There is experimental support for using V8 as the interprete in JIT mode, rather than WABT."
[v8.dev](https://v8.dev/docs/build), and [there are examples for embedding | ||
v8](https://v8.dev/docs/embed). The process is summarized below. | ||
|
||
- Install |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since V8 is complex to build and will be built externally, we should probably pick a specific version we know to be stable and to support all the ops Halide needs, and add that information here (e.g., what version to use, and what commands to use to ensure that's the version built locally).
Per our discussion elsewhere, at least 9.8.147 is required for our purposes, so let's document that as the minimum for now?
|
||
- Install | ||
[`depot_tools`](https://commondatastorage.googleapis.com/chrome-infra-docs/flat/depot_tools/docs/html/depot_tools_tutorial.html#_setting_up) | ||
- Fetch v8 source code (and install required dependencies) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add specific instructions for how to do this for the specific version we want -- pointing them at the general instructions above is good, but repeating the exact steps needed here is also good and should be pretty trivial and the downstream users will thank us. Here's what I did, let me know if this is correct:
$ gclient
$ mkdir ~/v8 && cd ~/v8
$ fetch v8
$ cd ~/v8/v8
$ git checkout origin/9.8.147
$ tools/dev/v8gen.py x64.release.sample
$ echo 'v8_enable_pointer_compression = false' >> out.gn/x64.release.sample/args.gn
$ autoninja -C out.gn/x64.release.sample v8_monolith
|
||
With V8 built, we can pass the CMake options: | ||
|
||
- `V8_INCLUDE_PATH`, path to V8 includes, e.g. `src/v8/v8/include` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To match the examples above, this should be ~/v8/8/include
|
||
- `V8_INCLUDE_PATH`, path to V8 includes, e.g. `src/v8/v8/include` | ||
- `V8_LIB_PATH`, path to V8 static library, e.g. | ||
`src/v8/v8/out.gn/x64.release.sample/obj/libv8_monolith.a` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To match the examples above, this should be ~/v8/v8/out.gn/x64.release.sample/obj/libv8_monolith.a
``` | ||
cmake -G Ninja -DWITH_V8=ON -DWITH_WABT=OFF \ | ||
-DV8_INCLUDE_PATH=/path/to/v8/v8/include \ | ||
-DV8_LIB_PATH=/path/to/v8/v8/out.gn/x64.release.sample/obj/libv8_monolith.a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure you are missing a build instruction here -- either make
or ninja
or even cmake --build
See #6478 for a finished PR -- this one can be closed. |
This is a partial revert of #5097.
It brings back a bunch of the code in WasmExecutor to set up and use V8
to run Wasm code. All of the code is copy-pasted. There are some small
cleanups to move common code (like BDMalloc, structs, asserts) to a
common area guarded by
if WITH_WABT || WITH_V8
.Enabling V8 requires setting 2 CMake options:
The first is a path to v8 include folder, to find headers, the second is
the monolithic v8 library. This is because it's pretty difficult to
build v8, and there are various flags you can set. Comments around those
options provide some instructions for building v8.
By default, we still use the wabt for running Wasm code, but we can use
V8 by setting WITH_WABT=OFF WITH_V8=ON. Maybe in the future, with more
testing, we can flip this.
Right now this requires a locally patched build of V8 due to
https://crbug.com/v8/10461, but once that is resolved, the version of V8
that includes the fix will be fine.
Also enable a single test, block_transpose, to run on V8, with these
results:
$ HL_JIT_TARGET=wasm-32-wasmrt-wasm_simd128
./test/performance/performance_block_transpose
Dummy Func version: Scalar transpose bandwidth 3.45061e+08 byte/s.
Wrapper version: Scalar transpose bandwidth 3.38931e+08 byte/s.
Dummy Func version: Transpose vectorized in y bandwidth 6.74143e+08
byte/s.
Wrapper version: Transpose vectorized in y bandwidth 3.54331e+08 byte/s.
Dummy Func version: Transpose vectorized in x bandwidth 3.50053e+08
byte/s.
Wrapper version: Transpose vectorized in x bandwidth 6.73421e+08 byte/s.
Success!
For comparison, when targeting host:
$ ./test/performance/performance_block_transpose
Dummy Func version: Scalar transpose bandwidth 1.33689e+09 byte/s.
Wrapper version: Scalar transpose bandwidth 1.33583e+09 byte/s.
Dummy Func version: Transpose vectorized in y bandwidth 2.20278e+09
byte/s.
Wrapper version: Transpose vectorized in y bandwidth 1.45959e+09 byte/s.
Dummy Func version: Transpose vectorized in x bandwidth 1.45921e+09
byte/s.
Wrapper version: Transpose vectorized in x bandwidth 2.21746e+09 byte/s.
Success!
For comparison, running with wabt:
Dummy Func version: Scalar transpose bandwidth 828715 byte/s.
Wrapper version: Scalar transpose bandwidth 826204 byte/s.
Dummy Func version: Transpose vectorized in y bandwidth 1.12008e+06
byte/s.
Wrapper version: Transpose vectorized in y bandwidth 874958 byte/s.
Dummy Func version: Transpose vectorized in x bandwidth 879031 byte/s.
Wrapper version: Transpose vectorized in x bandwidth 1.10525e+06 byte/s.
Success!