Skip to content

WebAssembly Updates #5097

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jul 29, 2020
Merged

WebAssembly Updates #5097

merged 12 commits into from
Jul 29, 2020

Conversation

steven-johnson
Copy link
Contributor

@steven-johnson steven-johnson commented Jul 9, 2020

Revamp our WebAssembly support to bring it more up-to-date with current spec:

  • Remove the use of V8 for our "jit" support and use the WABT interpreter instead. (It's vastly slower but is correct and is far easier to integrate into Halide.)
  • Remove all explicit support for wasm building/testing from the Makefile; move it into CMake instead.
  • Update all tests to reflect current status of build/test accuracy.
  • Especially update simd_op_check to verify final-spec simd ops.
  • Update README.

Fixes #4245

steven-johnson added a commit that referenced this pull request Jul 10, 2020
This has been split off from #5097 to make reviewing simpler; this modifies a bunch of tests for better use with WASM:
- everything in auto_schedule is skipped, since the Mullapudi2016 autoscheduler assert-fails for most non-real-CPU backends
- everything in performance is skipped, because we'll be doing wasm "jit" testing with an interpreter, so performance results would be meaningless
 - Added/updated the skip message for tests that exercise things that wasm doesn't support yet (eg atomics, threads)
- For a handful of tests that are unreasonably slow under the wasm "jit" interpreter, dialed down some of the test parameters to avoid test timeouts
- Removed some wasm-related skips that won't be necessary any longer
@steven-johnson steven-johnson force-pushed the srj-wabt branch 2 times, most recently from 2ffa61c to 6785c98 Compare July 14, 2020 17:16
steven-johnson added a commit that referenced this pull request Jul 14, 2020
Cherry-picking some CMake-related stuff from #5097 to make it easier to review:

- Add an optional COMMAND argument to add_halide_test(), this allows us to customize the command for execution (e.g. to run generated .wasm with a shell tool)

- Add some missing DEPENDS in a few places

- Add an optional EXTRA_LIBS argument to halide_define_aot_test(); this allows us to pass extra dependencies rather than requiring separate calls to target_link_libraries().

That last one is a little odd, so let me expand: the intent here is that (when the wasm changes land), some of the tests that aren't usable under wasm (e.g. matlab), and halide_define_aot_test() will handle these by just skipping those targets entirely. This means that we can effectively centralize the blocklist in one place, and then the callers of halide_define_aot_test() can just do something like

        halide_define_aot_test(matlab)
        if (TARGET generator_aot_matlab)
          set_target_properties(generator_aot_matlab PROPERTIES ENABLE_EXPORTS True)
        endif ()

... i.e., just assume that if the target may not be defined if it's blacklisted for some reason.

(Open for better suggestions on this last one, but I felt it was better than spraying lots of checks for "if wasm blah blah" thru this entire file)
steven-johnson added a commit that referenced this pull request Jul 14, 2020
Cherry-picking a change from #5097 so that reviewing that PR will be simpler; this just adds another feature option for wasm that tooling wasn't ready for before, but is now (saturating float-to-int conversion).
@alexreinking
Copy link
Member

re: the GIT_SUBMODULES CMake bug. It was fixed:

https://gitlab.kitware.com/cmake/cmake/-/merge_requests/4729

@steven-johnson
Copy link
Contributor Author

re: the GIT_SUBMODULES CMake bug. It was fixed:

https://gitlab.kitware.com/cmake/cmake/-/merge_requests/4729

...but presumably in 3.18+, so we can't rely on it yet?

@alexreinking
Copy link
Member

alexreinking commented Jul 15, 2020

...but presumably in 3.18+, so we can't rely on it yet?

Ah, I misunderstood your comment in the code. I thought that the bug was introduced in 3.18. Regardless, leaving GIT_SUBMODULES "" in the FetchContent_Declare is fine. In the bugged versions, it's a no-op. In the corrected versions, it leaves out the submodules like we want.

@alexreinking alexreinking self-requested a review July 16, 2020 00:25
@steven-johnson steven-johnson force-pushed the srj-wabt branch 2 times, most recently from 7f10ff7 to 95b3ed5 Compare July 20, 2020 22:35
@steven-johnson steven-johnson changed the title WebAssembly fixes (draft) WebAssembly Updates Jul 23, 2020
@steven-johnson steven-johnson marked this pull request as ready for review July 23, 2020 21:31
@steven-johnson steven-johnson requested a review from abadams July 23, 2020 21:31
@steven-johnson
Copy link
Contributor Author

There's more wasm-related work to do in the future (notably, getting benchmarks/performance tests working), but I think this is a good enough stopping point that we should look at reviewing and landing it.

(Note that wasm testing isn't enabled on the buildbots yet -- it never has been! -- but should be possible now thanks to these changes. After this lands, I'll update the buildbots accordingly.)

`HL_JIT_TARGET=wasm-32-wasmrt-wasm_simd128`) and run normally. The test suites
which we have vetted to work include correctness, performance, error, and
warning. (Some of the others could likely be made to work with modest effort.)
- To run the JIT tests, set `HL_JIT_TARGET=wasm-32-wasmrt` (possibly adding `wasm_simd128`, `wasm_signext`, and/or `wasm_sat_float_to_int`) and run CMake/CTest normally. Note that wasm testing is only support under CMake (not via Make).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line wrapping in this file seems necessary. I think @alexreinking used a markdown auto-formatter at some point that might do it?

Copy link
Member

@alexreinking alexreinking Jul 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's called prettier.io. You can install it from NPM. I like it quite a bit. I make sure to always pass --prose-wrap always at the CLI so that it reflows paragraphs nicely.

{Target::WasmSimd128, true, UInt(16, 8), 0, "llvm.wasm.avgr.unsigned.v8i16", u16(((wild_u32x_ + wild_u32x_) + 1) >> 1)},

// TODO: LLVM should support this directly, but doesn't yet.
// To make this work, we need to be able to call the intrinsics with two vecs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I've had to do this in the past is with force-inlined implementations that accept the wider vec, e.g. see packsswbx16 in src/runtime/x86.ll

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the wasm/LLVM folks has been working on improving these codegen issues in their backend, so I haven't expended much energy on trying to improve it here yet. I'll add this as a note in case circumstances make us need to improve sooner.

String::Utf8Value str(isolate, key);
wdebug(0) << "\t" << i + 1 << ". " << *str << "\n";
}
// lld will temporarily hijack the signal handlers to ensure that temp files get cleaned up,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounded painful to debug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it wasn't the worst thing I've ever had to do, but it wasn't much fun, either

inline void ExtractAndStoreScalar<int64_t>::operator()(const Local<Context> &context, const Local<Value> &val, void *slot) {
internal_error << "TODO: 64-bit slots aren't yet supported";
std::string to_string(const wabt::MemoryStream &m) {
// TODO: ugh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please elaborate. Are you talking about the weird requirement to const cast?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that comment doesn't belong there. Removing.

@steven-johnson
Copy link
Contributor Author

Can I get final review/approval for this? (Only failure is unrelated Windows GPU failure)

@alexreinking
Copy link
Member

alexreinking commented Jul 27, 2020

I'd like to check that the commands in the README still work even with an LLVM that doesn't have WebAssembly/WABT/whatever, since the Ubuntu-bundled one doesn't.

@steven-johnson
Copy link
Contributor Author

I'd like to check that the commands in the README still work even with an LLVM that doesn't have WebAssembly/WABT/whatever, since the Ubuntu-bundled one doesn't.

cool, will wait

Copy link
Member

@alexreinking alexreinking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be some weirdness if you try to use this with WITH_WASM_SHELL disabled, but we'll cross that bridge when we come to it.

steven-johnson added a commit to halide/build_bot that referenced this pull request Jul 28, 2020
will be necessary once halide/Halide#5097 lands.
@steven-johnson steven-johnson merged commit 83b452f into master Jul 29, 2020
@steven-johnson steven-johnson deleted the srj-wabt branch July 29, 2020 17:24
ngzhian added a commit to ngzhian/Halide that referenced this pull request Nov 19, 2021
This is a partial revert of halide#5097.
It brings back a bunch of the code in WasmExecutor to set up and use V8
to run Wasm code. All of the code is copy-pasted. There are some small
cleanups to move common code (like BDMalloc, structs, asserts) to a
common area guarded by `if WITH_WABT || WITH_V8`.

Enabling V8 requires setting 2 CMake options:
- V8_INCLUDE_PATH
- V8_LIB_PATH

The first is a path to v8 include folder, to find headers, the second is
the monolithic v8 library. This is because it's pretty difficult to
build v8, and there are various flags you can set. Comments around those
options provide some instructions for building v8.

By default, we still use the wabt for running Wasm code, but we can use
V8 by setting WITH_WABT=OFF WITH_V8=ON. Maybe in the future, with more
testing, we can flip this.

Right now this requires a locally patched build of V8 due to
https://crbug.com/v8/10461, but once that is resolved, the version of V8
that includes the fix will be fine.

Also enable a single test, block_transpose, to run on V8, with these
results:

$ HL_JIT_TARGET=wasm-32-wasmrt-wasm_simd128 \
  ./test/performance/performance_block_transpose
Dummy Func version: Scalar transpose bandwidth 3.45061e+08 byte/s.
Wrapper version: Scalar transpose bandwidth 3.38931e+08 byte/s.
Dummy Func version: Transpose vectorized in y bandwidth 6.74143e+08
byte/s.
Wrapper version: Transpose vectorized in y bandwidth 3.54331e+08 byte/s.
Dummy Func version: Transpose vectorized in x bandwidth 3.50053e+08
byte/s.
Wrapper version: Transpose vectorized in x bandwidth 6.73421e+08 byte/s.
Success!

For comparison, when targeting host:

$ ./test/performance/performance_block_transpose
Dummy Func version: Scalar transpose bandwidth 1.33689e+09 byte/s.
Wrapper version: Scalar transpose bandwidth 1.33583e+09 byte/s.
Dummy Func version: Transpose vectorized in y bandwidth 2.20278e+09
byte/s.
Wrapper version: Transpose vectorized in y bandwidth 1.45959e+09 byte/s.
Dummy Func version: Transpose vectorized in x bandwidth 1.45921e+09
byte/s.
Wrapper version: Transpose vectorized in x bandwidth 2.21746e+09 byte/s.
Success!

For comparison, running with wabt:

Dummy Func version: Scalar transpose bandwidth 828715 byte/s.
Wrapper version: Scalar transpose bandwidth 826204 byte/s.
Dummy Func version: Transpose vectorized in y bandwidth 1.12008e+06
byte/s.
Wrapper version: Transpose vectorized in y bandwidth 874958 byte/s.
Dummy Func version: Transpose vectorized in x bandwidth 879031 byte/s.
Wrapper version: Transpose vectorized in x bandwidth 1.10525e+06 byte/s.
Success!
steven-johnson added a commit that referenced this pull request Dec 14, 2021
* Support using V8 as the Wasm JIT interpreter

This is a partial revert of #5097.
It brings back a bunch of the code in WasmExecutor to set up and use V8
to run Wasm code. All of the code is copy-pasted. There are some small
cleanups to move common code (like BDMalloc, structs, asserts) to a
common area guarded by `if WITH_WABT || WITH_V8`.

Enabling V8 requires setting 2 CMake options:
- V8_INCLUDE_PATH
- V8_LIB_PATH

The first is a path to v8 include folder, to find headers, the second is
the monolithic v8 library. This is because it's pretty difficult to
build v8, and there are various flags you can set. Comments around those
options provide some instructions for building v8.

By default, we still use the wabt for running Wasm code, but we can use
V8 by setting WITH_WABT=OFF WITH_V8=ON. Maybe in the future, with more
testing, we can flip this.

Right now this requires a locally patched build of V8 due to
https://crbug.com/v8/10461, but once that is resolved, the version of V8
that includes the fix will be fine.

Also enable a single test, block_transpose, to run on V8, with these
results:

$ HL_JIT_TARGET=wasm-32-wasmrt-wasm_simd128 \
  ./test/performance/performance_block_transpose
Dummy Func version: Scalar transpose bandwidth 3.45061e+08 byte/s.
Wrapper version: Scalar transpose bandwidth 3.38931e+08 byte/s.
Dummy Func version: Transpose vectorized in y bandwidth 6.74143e+08
byte/s.
Wrapper version: Transpose vectorized in y bandwidth 3.54331e+08 byte/s.
Dummy Func version: Transpose vectorized in x bandwidth 3.50053e+08
byte/s.
Wrapper version: Transpose vectorized in x bandwidth 6.73421e+08 byte/s.
Success!

For comparison, when targeting host:

$ ./test/performance/performance_block_transpose
Dummy Func version: Scalar transpose bandwidth 1.33689e+09 byte/s.
Wrapper version: Scalar transpose bandwidth 1.33583e+09 byte/s.
Dummy Func version: Transpose vectorized in y bandwidth 2.20278e+09
byte/s.
Wrapper version: Transpose vectorized in y bandwidth 1.45959e+09 byte/s.
Dummy Func version: Transpose vectorized in x bandwidth 1.45921e+09
byte/s.
Wrapper version: Transpose vectorized in x bandwidth 2.21746e+09 byte/s.
Success!

For comparison, running with wabt:

Dummy Func version: Scalar transpose bandwidth 828715 byte/s.
Wrapper version: Scalar transpose bandwidth 826204 byte/s.
Dummy Func version: Transpose vectorized in y bandwidth 1.12008e+06
byte/s.
Wrapper version: Transpose vectorized in y bandwidth 874958 byte/s.
Dummy Func version: Transpose vectorized in x bandwidth 879031 byte/s.
Wrapper version: Transpose vectorized in x bandwidth 1.10525e+06 byte/s.
Success!

* Add instructions to build V8

* Formatting

* More documentation

* Update README_webassembly.md

* Update README_webassembly.md

* Update WasmExecutor.cpp

* Update WasmExecutor.cpp

* Skip tests

* Update WasmExecutor.cpp

* Skip performance tests

* Update WasmExecutor.cpp

* Address review comments

* 9.8.147 -> 9.8.177

Co-authored-by: Ng Zhi An <zhin@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Buildbots need to build/test WASM code
3 participants