Mark a handful of functions on the calling-into-Wasm path as `#[inline]` #10643

fitzgen · 2025-04-22T19:53:45Z

This provides an improvement across the board for our sync/no-hook benchmarks:

Benchmark Results

$ cargo bench --profile profiling --bench call '\bsync/no-hook' -- --baseline main
    Finished `profiling` profile [optimized + debuginfo] target(s) in 0.28s
     Running benches/call.rs (target/profiling/deps/call-b0a2bedd3336ad76)
sync/no-hook/core - host-to-wasm - typed - nop
                        time:   [27.334 ns 27.499 ns 27.668 ns]
                        change: [-16.388% -14.870% -13.479%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe
sync/no-hook/core - host-to-wasm - untyped - nop
                        time:   [44.141 ns 44.429 ns 44.757 ns]
                        change: [-18.380% -17.041% -15.670%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
sync/no-hook/core - host-to-wasm - unchecked - nop
                        time:   [29.731 ns 29.983 ns 30.262 ns]
                        change: [-25.104% -22.176% -19.159%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe
sync/no-hook/core - host-to-wasm - typed - nop-params-and-results
                        time:   [28.990 ns 29.143 ns 29.303 ns]
                        change: [-25.804% -24.562% -23.372%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
sync/no-hook/core - host-to-wasm - untyped - nop-params-and-results
                        time:   [110.00 ns 110.65 ns 111.46 ns]
                        change: [-11.967% -9.0070% -6.1347%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe
sync/no-hook/core - host-to-wasm - unchecked - nop-params-and-results
                        time:   [58.828 ns 59.089 ns 59.418 ns]
                        change: [-15.596% -13.573% -11.484%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

sync/no-hook/core - wasm-to-host - typed - nop
                        time:   [6.6209 ns 6.6615 ns 6.7077 ns]
                        change: [-53.555% -52.878% -52.116%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe
sync/no-hook/core - wasm-to-host - typed - nop-params-and-results
                        time:   [7.9783 ns 8.0173 ns 8.0611 ns]
                        change: [-54.341% -53.947% -53.505%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe
sync/no-hook/core - wasm-to-host - untyped - nop
                        time:   [18.306 ns 18.393 ns 18.491 ns]
                        change: [-29.104% -28.127% -27.171%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe
sync/no-hook/core - wasm-to-host - untyped - nop-params-and-results
                        time:   [67.741 ns 68.120 ns 68.601 ns]
                        change: [-26.453% -25.061% -23.663%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  6 (6.00%) high mild
  6 (6.00%) high severe
sync/no-hook/core - wasm-to-host - unchecked - nop
                        time:   [6.8379 ns 6.8915 ns 6.9566 ns]
                        change: [-55.623% -55.062% -54.481%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe
sync/no-hook/core - wasm-to-host - unchecked - nop-params-and-results
                        time:   [27.856 ns 28.024 ns 28.214 ns]
                        change: [-17.320% -16.103% -15.038%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe

sync/no-hook/component - host-to-wasm - typed - nop
                        time:   [55.126 ns 55.506 ns 55.932 ns]
                        change: [-19.458% -18.098% -16.736%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe
sync/no-hook/component - host-to-wasm - untyped - nop
                        time:   [101.42 ns 102.06 ns 102.82 ns]
                        change: [-15.679% -14.108% -12.523%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe
sync/no-hook/component - host-to-wasm - typed - nop-params-and-results
                        time:   [61.482 ns 62.017 ns 62.591 ns]
                        change: [-16.576% -15.100% -13.595%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  9 (9.00%) high mild
  1 (1.00%) high severe
sync/no-hook/component - host-to-wasm - untyped - nop-params-and-results
                        time:   [223.50 ns 224.72 ns 226.05 ns]
                        change: [-21.732% -20.178% -18.679%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

sync/no-hook/component - wasm-to-host - typed - nop
                        time:   [39.115 ns 39.295 ns 39.500 ns]
                        change: [-15.139% -13.886% -12.721%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  5 (5.00%) high severe
sync/no-hook/component - wasm-to-host - typed - nop-params-and-results
                        time:   [47.234 ns 47.458 ns 47.745 ns]
                        change: [-13.833% -11.951% -9.8784%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe
sync/no-hook/component - wasm-to-host - untyped - nop
                        time:   [52.311 ns 52.556 ns 52.817 ns]
                        change: [-12.736% -11.712% -10.693%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe
sync/no-hook/component - wasm-to-host - untyped - nop-params-and-results
                        time:   [239.71 ns 241.59 ns 244.11 ns]
                        change: [-29.804% -28.173% -26.415%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

Depends on #10626

alexcrichton

I'm a bit surprised about the #[inline] on dynamically typed things since there are so many other inefficiences related to that, but otherwise at a high level looks good

alexcrichton · 2025-04-22T19:56:56Z

crates/wasmtime/src/runtime/func.rs

@@ -1178,6 +1179,7 @@ impl Func {
    /// This must be called just before `call_impl_do_call`.
    ///
    /// Returns whether we need to GC before calling `call_impl_do_call`.
+    #[inline]


Since this is dealing with a dynamic Val does #[inline] really help much here?

This one was indeed showing up in profiles.

alexcrichton · 2025-04-22T19:58:33Z

crates/wasmtime/src/runtime/func.rs

@@ -943,6 +943,7 @@ impl Func {
    /// `StoreOpaque` while the `FuncType` is also being used (from the
    /// perspective of the borrow-checker) because otherwise the signature would
    /// consider `StoreOpaque` borrowed mutable while `FuncType` is in use.
+    #[inline]


This seems related to the dynamic calls case which we don't tend to try to put too much effort into optimizing?

It was hot in a benchmark that was regressing compared to main; easy to fix; don't see why we wouldn't do this.

alexcrichton · 2025-04-22T19:59:40Z

crates/wasmtime/src/runtime/func/typed.rs

@@ -179,6 +180,7 @@ where
    ///
    /// If `Self::need_gc_before_call_raw`, then the caller must have done a GC
    /// just before calling this method.
+    #[inline]


This is a pretty big function, does inlining this really help that much?

This one was also showing up in profiles.

crates/wasmtime/src/runtime/vm/interpreter.rs

fitzgen · 2025-04-22T21:07:46Z

I'm a bit surprised about the #[inline] on dynamically typed things since there are so many other inefficiences related to that, but otherwise at a high level looks good

I didn't measure every #[inline] independently, I just looked at which functions were at the top of the perf report and marked them #[inline] (and some of their callees) if it seemed like it made sense, and then I measured the benchmarks with all the changes together. So it is possible that some of these are not strictly necessary, but I think that is fine, since testing all combinations of #[inline]s here would take forever.

This provides an improvement across the board for our `sync/no-hook` benchmarks: <details> <summary>Benchmark Results</summary> ``` $ cargo bench --profile profiling --bench call '\bsync/no-hook' -- --baseline main Finished `profiling` profile [optimized + debuginfo] target(s) in 0.28s Running benches/call.rs (target/profiling/deps/call-b0a2bedd3336ad76) sync/no-hook/core - host-to-wasm - typed - nop time: [27.334 ns 27.499 ns 27.668 ns] change: [-16.388% -14.870% -13.479%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 5 (5.00%) high mild 2 (2.00%) high severe sync/no-hook/core - host-to-wasm - untyped - nop time: [44.141 ns 44.429 ns 44.757 ns] change: [-18.380% -17.041% -15.670%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) high mild 2 (2.00%) high severe sync/no-hook/core - host-to-wasm - unchecked - nop time: [29.731 ns 29.983 ns 30.262 ns] change: [-25.104% -22.176% -19.159%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 5 (5.00%) high mild 2 (2.00%) high severe sync/no-hook/core - host-to-wasm - typed - nop-params-and-results time: [28.990 ns 29.143 ns 29.303 ns] change: [-25.804% -24.562% -23.372%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe sync/no-hook/core - host-to-wasm - untyped - nop-params-and-results time: [110.00 ns 110.65 ns 111.46 ns] change: [-11.967% -9.0070% -6.1347%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe sync/no-hook/core - host-to-wasm - unchecked - nop-params-and-results time: [58.828 ns 59.089 ns 59.418 ns] change: [-15.596% -13.573% -11.484%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high severe sync/no-hook/core - wasm-to-host - typed - nop time: [6.6209 ns 6.6615 ns 6.7077 ns] change: [-53.555% -52.878% -52.116%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe sync/no-hook/core - wasm-to-host - typed - nop-params-and-results time: [7.9783 ns 8.0173 ns 8.0611 ns] change: [-54.341% -53.947% -53.505%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high severe sync/no-hook/core - wasm-to-host - untyped - nop time: [18.306 ns 18.393 ns 18.491 ns] change: [-29.104% -28.127% -27.171%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) high mild 4 (4.00%) high severe sync/no-hook/core - wasm-to-host - untyped - nop-params-and-results time: [67.741 ns 68.120 ns 68.601 ns] change: [-26.453% -25.061% -23.663%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 6 (6.00%) high mild 6 (6.00%) high severe sync/no-hook/core - wasm-to-host - unchecked - nop time: [6.8379 ns 6.8915 ns 6.9566 ns] change: [-55.623% -55.062% -54.481%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 5 (5.00%) high mild 2 (2.00%) high severe sync/no-hook/core - wasm-to-host - unchecked - nop-params-and-results time: [27.856 ns 28.024 ns 28.214 ns] change: [-17.320% -16.103% -15.038%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 6 (6.00%) high mild 4 (4.00%) high severe sync/no-hook/component - host-to-wasm - typed - nop time: [55.126 ns 55.506 ns 55.932 ns] change: [-19.458% -18.098% -16.736%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) high mild 6 (6.00%) high severe sync/no-hook/component - host-to-wasm - untyped - nop time: [101.42 ns 102.06 ns 102.82 ns] change: [-15.679% -14.108% -12.523%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 7 (7.00%) high mild 2 (2.00%) high severe sync/no-hook/component - host-to-wasm - typed - nop-params-and-results time: [61.482 ns 62.017 ns 62.591 ns] change: [-16.576% -15.100% -13.595%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 9 (9.00%) high mild 1 (1.00%) high severe sync/no-hook/component - host-to-wasm - untyped - nop-params-and-results time: [223.50 ns 224.72 ns 226.05 ns] change: [-21.732% -20.178% -18.679%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) high mild 3 (3.00%) high severe sync/no-hook/component - wasm-to-host - typed - nop time: [39.115 ns 39.295 ns 39.500 ns] change: [-15.139% -13.886% -12.721%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe sync/no-hook/component - wasm-to-host - typed - nop-params-and-results time: [47.234 ns 47.458 ns 47.745 ns] change: [-13.833% -11.951% -9.8784%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) high mild 6 (6.00%) high severe sync/no-hook/component - wasm-to-host - untyped - nop time: [52.311 ns 52.556 ns 52.817 ns] change: [-12.736% -11.712% -10.693%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 4 (4.00%) high mild 5 (5.00%) high severe sync/no-hook/component - wasm-to-host - untyped - nop-params-and-results time: [239.71 ns 241.59 ns 244.11 ns] change: [-29.804% -28.173% -26.415%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 4 (4.00%) high mild 5 (5.00%) high severe ``` </details>

alexcrichton

I know that I'm far more inline-averse than most others, but at the same time I don't think that I'm un-founded when I push back against #[inline] either. For example I don't doubt that dynamic Val-style call paths can benefit from some inlining, but that's also not the purpose of those code paths since they're explicitly de-optimized in many other ways as well. Personally I'd prefer to avoid manual #[inline] annotations there and let the compiler figure things out. If the compiler isn't the best at figuring things out, which it isn't for these benchmarks, then personally I'd prefer to just shrug and move on rather than add annotations to optimize an explicitly non-optimal path.

Another reason I'm personally pretty inline-averse is that it's such a brittle metric. For example TypedFunc::call_raw is a pretty big function here to be inlining. I don't doubt that it's in the profile but I'm also unsure whether inlining shows benefit, e.g. if it basically just shifted all its cost to the caller without further benefitting from inlining. This sort of code golfing is IMO brittle in the long-term where just because it improves the micro-benchmark now doesn't mean it'll continue to improve further into the future.

Overall I'm personally fine with #[inline] on "obvious small functions" like one-liners or similar. Bigger functions I'm less sure about and honestly IMO are basically just not worth annotation. In theory we should be measuring the attribute on big functions to see if it's worth it and most of the time I suspect that the effort to measure far outweighs the effort gained from a possible inlining.

At the end of the day I would relatively strongly prefer that #[inline] were removed from anything related to Val. I would then slightly prefer to remove #[inline] on anything bigger than one line basically. For this latter point I'll leave it up to you though since I realize the cost is not really worth debating over as well. Although that's also not quite true since if we #[inline] everything it's obviously quite bad for compile times and binary sizes, so there's some line somewhere I just tend to be far more conservative than others I think.

fitzgen · 2025-04-24T15:48:30Z

I've removed #[inline] from the larger methods.

fitzgen requested review from a team as code owners April 22, 2025 19:53

fitzgen requested review from alexcrichton and removed request for a team April 22, 2025 19:53

fitzgen mentioned this pull request Apr 22, 2025

Use EntryStoreContext to manage state when entering and exiting Wasm #10626

Merged

alexcrichton reviewed Apr 22, 2025

View reviewed changes

fitzgen added 2 commits April 22, 2025 14:42

remove inline(never) on interpreter method

f6763ee

fitzgen force-pushed the speed-up-call-benches branch from 29bd960 to f6763ee Compare April 22, 2025 21:42

fitzgen requested a review from alexcrichton April 22, 2025 21:43

alexcrichton approved these changes Apr 22, 2025

View reviewed changes

github-actions bot added the wasmtime:api Related to the API of the `wasmtime` crate itself label Apr 22, 2025

Remove #[inline] from larger methods

0299be6

fitzgen enabled auto-merge April 24, 2025 15:48

fitzgen added this pull request to the merge queue Apr 24, 2025

Merged via the queue into bytecodealliance:main with commit cb484dd Apr 24, 2025
41 checks passed

fitzgen deleted the speed-up-call-benches branch April 24, 2025 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mark a handful of functions on the calling-into-Wasm path as `#[inline]` #10643

Mark a handful of functions on the calling-into-Wasm path as `#[inline]` #10643

Uh oh!

fitzgen commented Apr 22, 2025

Uh oh!

alexcrichton left a comment

Uh oh!

alexcrichton Apr 22, 2025

Uh oh!

fitzgen Apr 22, 2025

Uh oh!

alexcrichton Apr 22, 2025

Uh oh!

fitzgen Apr 22, 2025

Uh oh!

alexcrichton Apr 22, 2025

Uh oh!

fitzgen Apr 22, 2025

Uh oh!

Uh oh!

fitzgen commented Apr 22, 2025

Uh oh!

alexcrichton left a comment

Uh oh!

fitzgen commented Apr 24, 2025

Uh oh!

Uh oh!

Uh oh!

Mark a handful of functions on the calling-into-Wasm path as #[inline] #10643

Mark a handful of functions on the calling-into-Wasm path as #[inline] #10643

Uh oh!

Conversation

fitzgen commented Apr 22, 2025

Uh oh!

alexcrichton left a comment

Choose a reason for hiding this comment

Uh oh!

alexcrichton Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

fitzgen Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

alexcrichton Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

fitzgen Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

alexcrichton Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

fitzgen Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fitzgen commented Apr 22, 2025

Uh oh!

alexcrichton left a comment

Choose a reason for hiding this comment

Uh oh!

fitzgen commented Apr 24, 2025

Uh oh!

Uh oh!

Uh oh!

Mark a handful of functions on the calling-into-Wasm path as `#[inline]` #10643

Mark a handful of functions on the calling-into-Wasm path as `#[inline]` #10643