Skip to content

Support using V8 as the Wasm JIT interpreter #6429

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

ngzhian
Copy link
Contributor

@ngzhian ngzhian commented Nov 19, 2021

This is a partial revert of #5097.
It brings back a bunch of the code in WasmExecutor to set up and use V8
to run Wasm code. All of the code is copy-pasted. There are some small
cleanups to move common code (like BDMalloc, structs, asserts) to a
common area guarded by if WITH_WABT || WITH_V8.

Enabling V8 requires setting 2 CMake options:

  • V8_INCLUDE_PATH
  • V8_LIB_PATH

The first is a path to v8 include folder, to find headers, the second is
the monolithic v8 library. This is because it's pretty difficult to
build v8, and there are various flags you can set. Comments around those
options provide some instructions for building v8.

By default, we still use the wabt for running Wasm code, but we can use
V8 by setting WITH_WABT=OFF WITH_V8=ON. Maybe in the future, with more
testing, we can flip this.

Right now this requires a locally patched build of V8 due to
https://crbug.com/v8/10461, but once that is resolved, the version of V8
that includes the fix will be fine.

Also enable a single test, block_transpose, to run on V8, with these
results:

$ HL_JIT_TARGET=wasm-32-wasmrt-wasm_simd128
./test/performance/performance_block_transpose
Dummy Func version: Scalar transpose bandwidth 3.45061e+08 byte/s.
Wrapper version: Scalar transpose bandwidth 3.38931e+08 byte/s.
Dummy Func version: Transpose vectorized in y bandwidth 6.74143e+08
byte/s.
Wrapper version: Transpose vectorized in y bandwidth 3.54331e+08 byte/s.
Dummy Func version: Transpose vectorized in x bandwidth 3.50053e+08
byte/s.
Wrapper version: Transpose vectorized in x bandwidth 6.73421e+08 byte/s.
Success!

For comparison, when targeting host:

$ ./test/performance/performance_block_transpose
Dummy Func version: Scalar transpose bandwidth 1.33689e+09 byte/s.
Wrapper version: Scalar transpose bandwidth 1.33583e+09 byte/s.
Dummy Func version: Transpose vectorized in y bandwidth 2.20278e+09
byte/s.
Wrapper version: Transpose vectorized in y bandwidth 1.45959e+09 byte/s.
Dummy Func version: Transpose vectorized in x bandwidth 1.45921e+09
byte/s.
Wrapper version: Transpose vectorized in x bandwidth 2.21746e+09 byte/s.
Success!

For comparison, running with wabt:

Dummy Func version: Scalar transpose bandwidth 828715 byte/s.
Wrapper version: Scalar transpose bandwidth 826204 byte/s.
Dummy Func version: Transpose vectorized in y bandwidth 1.12008e+06
byte/s.
Wrapper version: Transpose vectorized in y bandwidth 874958 byte/s.
Dummy Func version: Transpose vectorized in x bandwidth 879031 byte/s.
Wrapper version: Transpose vectorized in x bandwidth 1.10525e+06 byte/s.
Success!

This is a partial revert of halide#5097.
It brings back a bunch of the code in WasmExecutor to set up and use V8
to run Wasm code. All of the code is copy-pasted. There are some small
cleanups to move common code (like BDMalloc, structs, asserts) to a
common area guarded by `if WITH_WABT || WITH_V8`.

Enabling V8 requires setting 2 CMake options:
- V8_INCLUDE_PATH
- V8_LIB_PATH

The first is a path to v8 include folder, to find headers, the second is
the monolithic v8 library. This is because it's pretty difficult to
build v8, and there are various flags you can set. Comments around those
options provide some instructions for building v8.

By default, we still use the wabt for running Wasm code, but we can use
V8 by setting WITH_WABT=OFF WITH_V8=ON. Maybe in the future, with more
testing, we can flip this.

Right now this requires a locally patched build of V8 due to
https://crbug.com/v8/10461, but once that is resolved, the version of V8
that includes the fix will be fine.

Also enable a single test, block_transpose, to run on V8, with these
results:

$ HL_JIT_TARGET=wasm-32-wasmrt-wasm_simd128 \
  ./test/performance/performance_block_transpose
Dummy Func version: Scalar transpose bandwidth 3.45061e+08 byte/s.
Wrapper version: Scalar transpose bandwidth 3.38931e+08 byte/s.
Dummy Func version: Transpose vectorized in y bandwidth 6.74143e+08
byte/s.
Wrapper version: Transpose vectorized in y bandwidth 3.54331e+08 byte/s.
Dummy Func version: Transpose vectorized in x bandwidth 3.50053e+08
byte/s.
Wrapper version: Transpose vectorized in x bandwidth 6.73421e+08 byte/s.
Success!

For comparison, when targeting host:

$ ./test/performance/performance_block_transpose
Dummy Func version: Scalar transpose bandwidth 1.33689e+09 byte/s.
Wrapper version: Scalar transpose bandwidth 1.33583e+09 byte/s.
Dummy Func version: Transpose vectorized in y bandwidth 2.20278e+09
byte/s.
Wrapper version: Transpose vectorized in y bandwidth 1.45959e+09 byte/s.
Dummy Func version: Transpose vectorized in x bandwidth 1.45921e+09
byte/s.
Wrapper version: Transpose vectorized in x bandwidth 2.21746e+09 byte/s.
Success!

For comparison, running with wabt:

Dummy Func version: Scalar transpose bandwidth 828715 byte/s.
Wrapper version: Scalar transpose bandwidth 826204 byte/s.
Dummy Func version: Transpose vectorized in y bandwidth 1.12008e+06
byte/s.
Wrapper version: Transpose vectorized in y bandwidth 874958 byte/s.
Dummy Func version: Transpose vectorized in x bandwidth 879031 byte/s.
Wrapper version: Transpose vectorized in x bandwidth 1.10525e+06 byte/s.
Success!
@ngzhian
Copy link
Contributor Author

ngzhian commented Nov 19, 2021

@steven-johnson fyi, this is probably not ready to merge, given that there is no version of V8 that can work yet (until https://crbug.com/v8/10461 is resolved), but wanted to upload for you to take a look first.

@steven-johnson
Copy link
Contributor

This looks generally promising. Agreed that it's not ready to merge until the bug above is fixed. A few general comments:

  • Need to (eventually) add documentation about how to configure and build V8 for this purpose (probably in README_webassembly); my recollection is that there are a number of ways to configure V8 builds, and the necessary settings aren't that obvious (and vary by platform).
  • Speaking of platforms: it's OK to target just x86-64-Linux (though OSX support would be nice as well). I wouldn't spend any time getting Windows support to work unless it's really easy.
  • It's regrettable that there is so much code duplication between WABT and V8 glue code; there's probably no easy fix here. That said, it would be nice to make a second pass at some point and verify that we are doing what is reasonable to minimize the duplication. (I gather that there is/was a proposal for a Common Embedding API for wasm runtimes but the last time I checked, ~2 years ago, it wasn't well-supported enough by either WABT or V8 to entertain using it.)
  • Nit: you need to run the run-clang-format.sh script to format the code correctly.

@ngzhian
Copy link
Contributor Author

ngzhian commented Nov 19, 2021

  • Need to (eventually) add documentation about how to configure and build V8 for this purpose (probably in README_webassembly); my recollection is that there are a number of ways to configure V8 builds, and the necessary settings aren't that obvious (and vary by platform).

Woops, forgot about this, added some documentation to README_webassembly, it also contains pointers to v8.dev, which should be the most accurate one.

  • Speaking of platforms: it's OK to target just x86-64-Linux (though OSX support would be nice as well). I wouldn't spend any time getting Windows support to work unless it's really easy.

Added a note about OS support too.

  • It's regrettable that there is so much code duplication between WABT and V8 glue code; there's probably no easy fix here. That said, it would be nice to make a second pass at some point and verify that we are doing what is reasonable to minimize the duplication. (I gather that there is/was a proposal for a Common Embedding API for wasm runtimes but the last time I checked, ~2 years ago, it wasn't well-supported enough by either WABT or V8 to entertain using it.)

You might be thinking of this https://github.com/WebAssembly/wasm-c-api there hasn't been much progress there. V8 does support that api, but it doesn't work well if you need to work with the rest of the V8 C++ API.

  • Nit: you need to run the run-clang-format.sh script to format the code correctly.

Thanks, done!

@ngzhian
Copy link
Contributor Author

ngzhian commented Nov 22, 2021

https://bugs.chromium.org/p/v8/issues/detail?id=10461#c10 V8 9.8.76 has the compile API we need.

@steven-johnson
Copy link
Contributor

https://bugs.chromium.org/p/v8/issues/detail?id=10461#c10 V8 9.8.76 has the compile API we need.

Cool! Let me know when this is ready for a re-review.

@ngzhian
Copy link
Contributor Author

ngzhian commented Nov 23, 2021

https://bugs.chromium.org/p/v8/issues/detail?id=10461#c10 V8 9.8.76 has the compile API we need.

Cool! Let me know when this is ready for a re-review.

This is ready, ptal.

@steven-johnson
Copy link
Contributor

This is ready, ptal.

OK, I will get to it sometime this week.

@steven-johnson
Copy link
Contributor

Hey, I still haven't gotten to this (various crises + required long meetings, etc), but we also need to touch base on issues of integrating this inside Google; in the past, there have been versioning issues (the version(s) of V8 we needed weren't available on a timely basis in google3). Let's discuss this via Chat (rather than here on GitHub) etc at your convenience to be sure this is going to be tractable there as well before moving forward.

Copy link
Contributor

@steven-johnson steven-johnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the documentation nits below this generally looks good. There are some new failures in the tests that need addressing before we can land this, however:

$ ctest -L correctness -j72
... much output ...
The following tests FAILED:
         53 - correctness_bounds_of_split (ILLEGAL)
        107 - correctness_exception (Child aborted)
        110 - correctness_extern_consumer (Child aborted)
        112 - correctness_extern_error (Child aborted)
        366 - correctness_vector_reductions (ILLEGAL)

These all pass under WABT -- that said, it may well be that these are things that got 'fixed' after V8 was deprecated and WABT added, not sure.

@@ -80,6 +80,44 @@ environment and are disabled).
In sum: don't plan on using Halide JIT mode with Wasm unless you are working on
the Halide library itself.

## Using V8 as the interpreter

There is experimental support for using V8 as the interpreter, it links in V8,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Awkward wording, how about "There is experimental support for using V8 as the interprete in JIT mode, rather than WABT."

[v8.dev](https://v8.dev/docs/build), and [there are examples for embedding
v8](https://v8.dev/docs/embed). The process is summarized below.

- Install
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since V8 is complex to build and will be built externally, we should probably pick a specific version we know to be stable and to support all the ops Halide needs, and add that information here (e.g., what version to use, and what commands to use to ensure that's the version built locally).

Per our discussion elsewhere, at least 9.8.147 is required for our purposes, so let's document that as the minimum for now?


- Install
[`depot_tools`](https://commondatastorage.googleapis.com/chrome-infra-docs/flat/depot_tools/docs/html/depot_tools_tutorial.html#_setting_up)
- Fetch v8 source code (and install required dependencies)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add specific instructions for how to do this for the specific version we want -- pointing them at the general instructions above is good, but repeating the exact steps needed here is also good and should be pretty trivial and the downstream users will thank us. Here's what I did, let me know if this is correct:

$ gclient
$ mkdir ~/v8 && cd ~/v8
$ fetch v8
$ cd ~/v8/v8
$ git checkout origin/9.8.147
$ tools/dev/v8gen.py x64.release.sample
$ echo 'v8_enable_pointer_compression = false'  >> out.gn/x64.release.sample/args.gn
$ autoninja -C out.gn/x64.release.sample v8_monolith


With V8 built, we can pass the CMake options:

- `V8_INCLUDE_PATH`, path to V8 includes, e.g. `src/v8/v8/include`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To match the examples above, this should be ~/v8/8/include


- `V8_INCLUDE_PATH`, path to V8 includes, e.g. `src/v8/v8/include`
- `V8_LIB_PATH`, path to V8 static library, e.g.
`src/v8/v8/out.gn/x64.release.sample/obj/libv8_monolith.a`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To match the examples above, this should be ~/v8/v8/out.gn/x64.release.sample/obj/libv8_monolith.a

```
cmake -G Ninja -DWITH_V8=ON -DWITH_WABT=OFF \
-DV8_INCLUDE_PATH=/path/to/v8/v8/include \
-DV8_LIB_PATH=/path/to/v8/v8/out.gn/x64.release.sample/obj/libv8_monolith.a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure you are missing a build instruction here -- either make or ninja or even cmake --build

@steven-johnson
Copy link
Contributor

See #6478 for a finished PR -- this one can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants