Crates.io | winliner |
lib.rs | winliner |
version | 1.0.1 |
source | src |
created_at | 2023-10-17 17:06:10.290922 |
updated_at | 2023-10-18 20:46:43.807063 |
description | The WebAssembly Indirect Call Inliner |
homepage | https://github.com/fitzgen/winliner |
repository | https://github.com/fitzgen/winliner |
max_upload_size | |
id | 1005945 |
size | 250,497 |
Winliner speculatively inlines indirect calls in WebAssembly, based on observed information from a previous profiling phase. This is a form of profile-guided optimization that we affectionately call winlining.
First, Winliner inserts instrumentation to observe the actual target callee of every indirect call site in your Wasm program. Next, you run the instrumented program for a while, building up a profile. Finally, you invoke Winliner again, this time providing it with the recorded profile, and it optimizes your Wasm program based on the behavior observed in that profile.
For example, if profiling shows that an indirect call always (or nearly always) goes to the 42nd entry in the funcrefs table, then Winliner will perform the following semantically-transparent transformation:
;; Before:
call_indirect
;; After:
;; If the callee index is 42, execute the inlined body of
;; the associated function.
local.tee $temp
i32.const 42
i32.eq
if
<inlined body of table[42] here>
else
local.get $temp
call_indirect
end
The speculative inlining by itself is generally not a huge performance win, since CPU indirect branch prediction is very powerful these days. (Although, depending on the Wasm engine, entering a new function may incur some cost and inlining does avoid that.) The primary benefit is that it allows the Wasm compiler to "see through" the indirect call and perform subsequent optimizations (like GVN and LICM) on the inlined callee's body, which can result in significant performance benefits.
This technique is similar to devirtualization but doesn't require that the compiler is able to statically determine the callee, nor that the callee is always a single, particular function 100% of the time. Unlike devirtualization, Winlining can still optimize indirect calls that go a certain way 99% of the time and a different way 1% of the time because it can always fall back to an unoptimized indirect call.
You can install via cargo
:
$ cargo install winliner --all-features
First, instrument your Wasm program:
$ winliner instrument my-program.wasm > my-program.instrumented.wasm
Next, run the instrumented program to build a profile. This can either be done in your Wasm environment of choice (e.g. the Web) with a little glue code to extract and shepherd out the profile, or you can run within Winliner itself and the Wasmtime-based WASI environment that comes with it:
$ winliner profile my-program.instrumented.wasm > profile.json
Finally, tell Winliner to optimize the original program based on the observed
call_indirect
behavior observed in the given profile:
$ winliner optimize --profile profile.json my-program.wasm > my-program.winlined.wasm
Winliner is not safe in the face of mutations to the funcref
table, which is
possible via the table.set
instruction (and others) introduced as part of
the reference-types
proposal. You must either
disable this proposal or manually uphold the invariant that the funcref
table is never mutated. Breaking this invariant will likely lead to diverging
behavior from the original program and very wonky bugs! Any exported funcref
tables must additionally not be mutated by the host.
Winliner only optimizes call_indirect
instructions; it cannot optimize
call_ref
instructions because WebAssembly function references are not
comparable, so we can't insert the if actual_callee == speculative_callee
check.
Winliner assumes support for the (widely implemented) multi-value proposal in its generated code.
First, add a dependency on Winliner to your Cargo.toml
:
[dependencies]
winliner = "1"
Then, use the library like so:
use winliner::{InstrumentationStrategy, Instrumenter, Optimizer, Profile, Result};
fn main() -> Result<()> {
let original_wasm = std::fs::read("path/to/my.wasm")?;
// Configure instrumentation.
let mut instrumenter = Instrumenter::new();
instrumenter.strategy(InstrumentationStrategy::ThreeGlobals);
// Instrument our wasm.
let instrumented_wasm = instrumenter.instrument(&original_wasm)?;
// Get a profile for our Wasm program from somewhere. Read it from disk,
// record it now in this process, etc...
//
// See the API docs for `Profile` for more details.
let profile = Profile::default();
// Configure optimization and thresholds for inlining.
let mut optimizer = Optimizer::new();
optimizer
.min_total_calls(100)
.min_ratio(0.8)?
.max_inline_depth(3);
// Run the optimizer with the given profile!
let optimized_wasm = optimizer.optimize(&profile, &original_wasm)?;
std::fs::write("path/to/optimized.wasm", optimized_wasm)?;
Ok(())
}
The inspiration for this tool -- along with the low-overhead but imprecise "three globals" instrumentation strategy -- sprang from conversations with Chris Fallin and Luke Wagner.