-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Mark a handful of functions on the calling-into-Wasm path as #[inline]
#10643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit surprised about the #[inline]
on dynamically typed things since there are so many other inefficiences related to that, but otherwise at a high level looks good
crates/wasmtime/src/runtime/func.rs
Outdated
@@ -1178,6 +1179,7 @@ impl Func { | |||
/// This must be called just before `call_impl_do_call`. | |||
/// | |||
/// Returns whether we need to GC before calling `call_impl_do_call`. | |||
#[inline] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is dealing with a dynamic Val
does #[inline]
really help much here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one was indeed showing up in profiles.
@@ -943,6 +943,7 @@ impl Func { | |||
/// `StoreOpaque` while the `FuncType` is also being used (from the | |||
/// perspective of the borrow-checker) because otherwise the signature would | |||
/// consider `StoreOpaque` borrowed mutable while `FuncType` is in use. | |||
#[inline] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems related to the dynamic calls case which we don't tend to try to put too much effort into optimizing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was hot in a benchmark that was regressing compared to main
; easy to fix; don't see why we wouldn't do this.
@@ -179,6 +180,7 @@ where | |||
/// | |||
/// If `Self::need_gc_before_call_raw`, then the caller must have done a GC | |||
/// just before calling this method. | |||
#[inline] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty big function, does inlining this really help that much?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one was also showing up in profiles.
I didn't measure every |
This provides an improvement across the board for our `sync/no-hook` benchmarks: <details> <summary>Benchmark Results</summary> ``` $ cargo bench --profile profiling --bench call '\bsync/no-hook' -- --baseline main Finished `profiling` profile [optimized + debuginfo] target(s) in 0.28s Running benches/call.rs (target/profiling/deps/call-b0a2bedd3336ad76) sync/no-hook/core - host-to-wasm - typed - nop time: [27.334 ns 27.499 ns 27.668 ns] change: [-16.388% -14.870% -13.479%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 5 (5.00%) high mild 2 (2.00%) high severe sync/no-hook/core - host-to-wasm - untyped - nop time: [44.141 ns 44.429 ns 44.757 ns] change: [-18.380% -17.041% -15.670%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) high mild 2 (2.00%) high severe sync/no-hook/core - host-to-wasm - unchecked - nop time: [29.731 ns 29.983 ns 30.262 ns] change: [-25.104% -22.176% -19.159%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 5 (5.00%) high mild 2 (2.00%) high severe sync/no-hook/core - host-to-wasm - typed - nop-params-and-results time: [28.990 ns 29.143 ns 29.303 ns] change: [-25.804% -24.562% -23.372%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe sync/no-hook/core - host-to-wasm - untyped - nop-params-and-results time: [110.00 ns 110.65 ns 111.46 ns] change: [-11.967% -9.0070% -6.1347%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe sync/no-hook/core - host-to-wasm - unchecked - nop-params-and-results time: [58.828 ns 59.089 ns 59.418 ns] change: [-15.596% -13.573% -11.484%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high severe sync/no-hook/core - wasm-to-host - typed - nop time: [6.6209 ns 6.6615 ns 6.7077 ns] change: [-53.555% -52.878% -52.116%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe sync/no-hook/core - wasm-to-host - typed - nop-params-and-results time: [7.9783 ns 8.0173 ns 8.0611 ns] change: [-54.341% -53.947% -53.505%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high severe sync/no-hook/core - wasm-to-host - untyped - nop time: [18.306 ns 18.393 ns 18.491 ns] change: [-29.104% -28.127% -27.171%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) high mild 4 (4.00%) high severe sync/no-hook/core - wasm-to-host - untyped - nop-params-and-results time: [67.741 ns 68.120 ns 68.601 ns] change: [-26.453% -25.061% -23.663%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 6 (6.00%) high mild 6 (6.00%) high severe sync/no-hook/core - wasm-to-host - unchecked - nop time: [6.8379 ns 6.8915 ns 6.9566 ns] change: [-55.623% -55.062% -54.481%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 5 (5.00%) high mild 2 (2.00%) high severe sync/no-hook/core - wasm-to-host - unchecked - nop-params-and-results time: [27.856 ns 28.024 ns 28.214 ns] change: [-17.320% -16.103% -15.038%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 6 (6.00%) high mild 4 (4.00%) high severe sync/no-hook/component - host-to-wasm - typed - nop time: [55.126 ns 55.506 ns 55.932 ns] change: [-19.458% -18.098% -16.736%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) high mild 6 (6.00%) high severe sync/no-hook/component - host-to-wasm - untyped - nop time: [101.42 ns 102.06 ns 102.82 ns] change: [-15.679% -14.108% -12.523%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 7 (7.00%) high mild 2 (2.00%) high severe sync/no-hook/component - host-to-wasm - typed - nop-params-and-results time: [61.482 ns 62.017 ns 62.591 ns] change: [-16.576% -15.100% -13.595%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 9 (9.00%) high mild 1 (1.00%) high severe sync/no-hook/component - host-to-wasm - untyped - nop-params-and-results time: [223.50 ns 224.72 ns 226.05 ns] change: [-21.732% -20.178% -18.679%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) high mild 3 (3.00%) high severe sync/no-hook/component - wasm-to-host - typed - nop time: [39.115 ns 39.295 ns 39.500 ns] change: [-15.139% -13.886% -12.721%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe sync/no-hook/component - wasm-to-host - typed - nop-params-and-results time: [47.234 ns 47.458 ns 47.745 ns] change: [-13.833% -11.951% -9.8784%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) high mild 6 (6.00%) high severe sync/no-hook/component - wasm-to-host - untyped - nop time: [52.311 ns 52.556 ns 52.817 ns] change: [-12.736% -11.712% -10.693%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 4 (4.00%) high mild 5 (5.00%) high severe sync/no-hook/component - wasm-to-host - untyped - nop-params-and-results time: [239.71 ns 241.59 ns 244.11 ns] change: [-29.804% -28.173% -26.415%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 4 (4.00%) high mild 5 (5.00%) high severe ``` </details>
29bd960
to
f6763ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that I'm far more inline-averse than most others, but at the same time I don't think that I'm un-founded when I push back against #[inline]
either. For example I don't doubt that dynamic Val
-style call paths can benefit from some inlining, but that's also not the purpose of those code paths since they're explicitly de-optimized in many other ways as well. Personally I'd prefer to avoid manual #[inline]
annotations there and let the compiler figure things out. If the compiler isn't the best at figuring things out, which it isn't for these benchmarks, then personally I'd prefer to just shrug and move on rather than add annotations to optimize an explicitly non-optimal path.
Another reason I'm personally pretty inline-averse is that it's such a brittle metric. For example TypedFunc::call_raw
is a pretty big function here to be inlining. I don't doubt that it's in the profile but I'm also unsure whether inlining shows benefit, e.g. if it basically just shifted all its cost to the caller without further benefitting from inlining. This sort of code golfing is IMO brittle in the long-term where just because it improves the micro-benchmark now doesn't mean it'll continue to improve further into the future.
Overall I'm personally fine with #[inline]
on "obvious small functions" like one-liners or similar. Bigger functions I'm less sure about and honestly IMO are basically just not worth annotation. In theory we should be measuring the attribute on big functions to see if it's worth it and most of the time I suspect that the effort to measure far outweighs the effort gained from a possible inlining.
At the end of the day I would relatively strongly prefer that #[inline]
were removed from anything related to Val
. I would then slightly prefer to remove #[inline]
on anything bigger than one line basically. For this latter point I'll leave it up to you though since I realize the cost is not really worth debating over as well. Although that's also not quite true since if we #[inline]
everything it's obviously quite bad for compile times and binary sizes, so there's some line somewhere I just tend to be far more conservative than others I think.
I've removed |
This provides an improvement across the board for our
sync/no-hook
benchmarks:Benchmark Results
Depends on #10626