-
Notifications
You must be signed in to change notification settings - Fork 295
wasm-mutate: Add a mutator to modify data segments #447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r=me with one thing below addressed
// If reduction is configured then we never add data, otherwise | ||
// arbitrarily choose to add or remove data. | ||
if config.reduce || config.rng().gen() { | ||
if data.len() > 0 { | ||
// Remove a random subslice of data, if there's actually some | ||
// data. | ||
let start = config.rng().gen_range(0, data.len()); | ||
let end = config.rng().gen_range(start, data.len()); | ||
data[start..end].to_vec() | ||
} else { | ||
Vec::new() | ||
} | ||
} else { | ||
// Insert some random bytes. | ||
let start = config.rng().gen_range(0, data.len()); | ||
let len = config.rng().gen_range(0, 100); | ||
data[0..start] | ||
.iter() | ||
.copied() | ||
.chain( | ||
config | ||
.rng() | ||
.sample_iter(rand::distributions::Standard) | ||
.take(len), | ||
) | ||
.chain(data[start..].iter().copied()) | ||
.collect() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should (either in a third logic branch or maybe replacing both branches but configuring the max sizes appropriately) add a mode where we call the configured mutate-this-raw-data hook which we can make call libfuzzer
's internal mutators directly when we are fuzzing.
wasm-tools/crates/wasm-mutate/src/lib.rs
Lines 233 to 246 in 5d4f007
/// Set a custom raw mutation function. | |
/// | |
/// This is used when we need some underlying raw bytes, for example when | |
/// mutating the contents of a data segment. | |
/// | |
/// You can override this to use `libFuzzer`'s `LLVMFuzzerMutate` function | |
/// to get raw bytes from `libFuzzer`, for example. | |
pub fn raw_mutate_func( | |
&mut self, | |
raw_mutate_func: Option<Arc<dyn Fn(&mut Vec<u8>) -> Result<()>>>, | |
) -> &mut Self { | |
self.raw_mutate_func = raw_mutate_func; | |
self | |
} |
https://docs.rs/libfuzzer-sys/latest/libfuzzer_sys/fn.fuzzer_mutate.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure I understand, you're imagining this saw raw_mutate_func
-style hook being configured in WasmMutate
? Something like an optional closure where, if present, is called here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think ideally we would write our own, default raw_mutate_func
implementation and then just always call raw_mutate_func
here, instead of maintaining N different code paths that do shrinking vs not shrinking etc. As long as raw_mutate_func
abides by the size constraints passed to it, we shouldn't need separate branches for adding vs deleting vs editing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would that work though with the seed? We wouldn't pass in the seed or the Unstructured
so what would the consumer used as the basis for mutating the input data?
Also if you're imagining that this is always an option should the CLI be the only one that implements this? By default would this mutator be disabled unless the callback were configured?
(sorry I'm not really sure how you're envisioning this working)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm yeah we should probably pass a seed into this callback.
We generally aren't using Unstructured
here, not sure what you're getting at here.
In practice, we will hook raw_mutate_func
up to https://docs.rs/libfuzzer-sys/latest/libfuzzer_sys/fn.fuzzer_mutate.html
If the raw_mutate_func
is not configured, then we should use a default, internal implementation of the raw_mutate_func
callback, similar to what you've done here but not hard-coded to be only for data segments.
impl WasmMutate {
fn get_raw_mutate_func(&self) -> &dyn Fn(&mut Vec<u8>, ...) {
self.raw_mutate_func.as_ref().unwrap_or(&default_raw_mutate_func)
}
}
fn default_raw_mutate_func(
data: &mut Vec<u8>,
// probably need to enrichen the `raw_mutate_func` signature with
// a seed and info about whether we are only shrinking...
) {
// ... the stuff you are doing for data segments right now ...
}
// ...
// Inside the data segment mutator.
(config.raw_mutate_func())(&mut my_data_segment);
Does that make sense?
Also if you're imagining that this is always an option should the CLI be the only one that implements this? By default would this mutator be disabled unless the callback were configured?
I would say the other way around: the CLI shouldn't expose this functionality because what different kinds of implementations can it have? We're probably not going to pipe out to another program to mutate these little bits of data, so there isn't really much to configure other than maybe the source of entropy. Not really useful here.
However, when using wasm-mutate
as a library inside a fuzz target, we definitely want to hook this up to https://docs.rs/libfuzzer-sys/latest/libfuzzer_sys/fn.fuzzer_mutate.html so that we can let libfuzzer use its dictionary and all of its builtin mutation schemes when mutating these data segments or export names or what have you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I was mixing up sources of randomness and got confused with Unstructured
vs just a plain old Rng
here.
Also thanks for explaining! I've pushed up something which I think roughly does what you're thinking, but I think it'd be good to double-check as well.
In an effort to minimize modules this commit adds a mutator which will modify data segments to either add or remove bytes (although only removal is used when reducing). The main meat of this commit is refactoring out the translation from `wasmparser` types to `wasm-encoder` types into a shared module for both the `RemoveItem` mutator and this new mutator to use.
4329cee
to
06058e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r=me with max_size
parameter
thanks for your patience on these back and forths!
crates/wasm-mutate/src/lib.rs
Outdated
fn raw_mutate(&mut self, data: &mut Vec<u8>) -> Result<()> { | ||
// If a raw mutation function is configured then that's prioritized. | ||
if let Some(mutate) = &self.raw_mutate_func { | ||
return mutate(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to pass a max_size
parameter through to the raw mutation function. (a) libfuzzer_sys::fuzzer_mutate
requires such a parameter, and (b) it lets us set max_size == data.len() - 1
when we are configured to only do size-reducing mutations. When we aren't shrinking, then we can either pass a max_size
of data.capacity()
or data.len() * 2
or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! I threaded this through from the configuration of the pass itself, since the renaming of exports was already being configured with a maximum size.
This fixes an issue from bytecodealliance#447 where the renaming export pass was mistakenly generating a duplicate export name because I forgot to add a check. The check is added here, however!
* wasm-mutate: Fix generating duplicate export names This fixes an issue from #447 where the renaming export pass was mistakenly generating a duplicate export name because I forgot to add a check. The check is added here, however! * Prevent an infinite loop * Consume fuel when generating random names. * Also allow generating the same name as before, notably the empty name.
This encoding change is necessary due to recent additions to the component model; see [bytecodealliance#447]. [bytecodealliance#447]: WebAssembly/component-model#447
[bytecodealliance#447] tries to make the built-in naming a bit more consistent; this change propagates that here. [bytecodealliance#447]: WebAssembly/component-model#447
This encoding change is necessary due to recent additions to the component model; see [bytecodealliance#447]. [bytecodealliance#447]: WebAssembly/component-model#447
[bytecodealliance#447] tries to make the built-in naming a bit more consistent; this change propagates that here. [bytecodealliance#447]: WebAssembly/component-model#447
[bytecodealliance#447] tries to make the built-in naming a bit more consistent; this change propagates that here. [bytecodealliance#447]: WebAssembly/component-model#447
This encoding change is necessary due to recent additions to the component model; see [bytecodealliance#447]. [bytecodealliance#447]: WebAssembly/component-model#447
[bytecodealliance#447] tries to make the built-in naming a bit more consistent; this change propagates that here. [bytecodealliance#447]: WebAssembly/component-model#447
This encoding change is necessary due to recent additions to the component model; see [bytecodealliance#447]. [bytecodealliance#447]: WebAssembly/component-model#447
* threads: add `thread.spawn_indirect` As discussed in [#89], this adds support for a new intrinsic, `thread.spawn_indirect`. This new operation would allow spawning a shared function stored in a table via a table index. This leaves some future work undone: - `thread.spawn` could/should be renamed to `thread.spawn_ref` - `thread.spawn_indirect` could/should take the encoding byte from `thread.hw_concurrency`--swap `0x07` for `0x06` - importantly, `thread.spawn_indirect` should gain a field indicating which type to expect in the indirect table; due to current limitations in `wasm-tools`, the locations to check once this is possible are marked with `TODO: spawn indirect types`. [#89]: WebAssembly/shared-everything-threads#89 * threads: rename `thread.spawn` to `thread.spawn_ref` [#447] tries to make the built-in naming a bit more consistent; this change propagates that here. [#447]: WebAssembly/component-model#447 * threads: move `thread.*` canonical opcodes to `0x40+` This encoding change is necessary due to recent additions to the component model; see [#447]. [#447]: WebAssembly/component-model#447 * threads: add function types to `thread.spawn_indirect` Initially I implemented `thread.spawn_indirect` without the ability to check the type of the function to spawn out of an abundance of caution (see the implementation issues described in [#89]). In the process of writing out the specification, we convinced ourselves that these problems should not apply to `thread.spawn_indirect`. This change adds the function type index necessary for doing some extra validation of `thread.spawn_indirect` and adds some tests related to this. One unimplemented TODO is what to do about shared tables: technically, the table used by a `thread.spawn_indirect` should be shared but we have so far prevented this in the component model; this can be resolved later, though. [#89]: WebAssembly/shared-everything-threads#89 * review: simplify WAST parsing
In an effort to minimize modules this commit adds a mutator which will
modify data segments to either add or remove bytes (although only
removal is used when reducing). The main meat of this commit is
refactoring out the translation from
wasmparser
types towasm-encoder
types into a shared module for both the
RemoveItem
mutator and thisnew mutator to use.