Skip to content

wasm-mutate: Add a mutator to modify data segments #447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 21, 2022

Conversation

alexcrichton
Copy link
Member

In an effort to minimize modules this commit adds a mutator which will
modify data segments to either add or remove bytes (although only
removal is used when reducing). The main meat of this commit is
refactoring out the translation from wasmparser types to wasm-encoder
types into a shared module for both the RemoveItem mutator and this
new mutator to use.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r=me with one thing below addressed

Comment on lines 70 to 98
// If reduction is configured then we never add data, otherwise
// arbitrarily choose to add or remove data.
if config.reduce || config.rng().gen() {
if data.len() > 0 {
// Remove a random subslice of data, if there's actually some
// data.
let start = config.rng().gen_range(0, data.len());
let end = config.rng().gen_range(start, data.len());
data[start..end].to_vec()
} else {
Vec::new()
}
} else {
// Insert some random bytes.
let start = config.rng().gen_range(0, data.len());
let len = config.rng().gen_range(0, 100);
data[0..start]
.iter()
.copied()
.chain(
config
.rng()
.sample_iter(rand::distributions::Standard)
.take(len),
)
.chain(data[start..].iter().copied())
.collect()
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should (either in a third logic branch or maybe replacing both branches but configuring the max sizes appropriately) add a mode where we call the configured mutate-this-raw-data hook which we can make call libfuzzer's internal mutators directly when we are fuzzing.

/// Set a custom raw mutation function.
///
/// This is used when we need some underlying raw bytes, for example when
/// mutating the contents of a data segment.
///
/// You can override this to use `libFuzzer`'s `LLVMFuzzerMutate` function
/// to get raw bytes from `libFuzzer`, for example.
pub fn raw_mutate_func(
&mut self,
raw_mutate_func: Option<Arc<dyn Fn(&mut Vec<u8>) -> Result<()>>>,
) -> &mut Self {
self.raw_mutate_func = raw_mutate_func;
self
}

https://docs.rs/libfuzzer-sys/latest/libfuzzer_sys/fn.fuzzer_mutate.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure I understand, you're imagining this saw raw_mutate_func-style hook being configured in WasmMutate? Something like an optional closure where, if present, is called here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think ideally we would write our own, default raw_mutate_func implementation and then just always call raw_mutate_func here, instead of maintaining N different code paths that do shrinking vs not shrinking etc. As long as raw_mutate_func abides by the size constraints passed to it, we shouldn't need separate branches for adding vs deleting vs editing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would that work though with the seed? We wouldn't pass in the seed or the Unstructured so what would the consumer used as the basis for mutating the input data?

Also if you're imagining that this is always an option should the CLI be the only one that implements this? By default would this mutator be disabled unless the callback were configured?

(sorry I'm not really sure how you're envisioning this working)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yeah we should probably pass a seed into this callback.

We generally aren't using Unstructured here, not sure what you're getting at here.

In practice, we will hook raw_mutate_func up to https://docs.rs/libfuzzer-sys/latest/libfuzzer_sys/fn.fuzzer_mutate.html

If the raw_mutate_func is not configured, then we should use a default, internal implementation of the raw_mutate_func callback, similar to what you've done here but not hard-coded to be only for data segments.

impl WasmMutate {
    fn get_raw_mutate_func(&self) -> &dyn Fn(&mut Vec<u8>, ...) {
        self.raw_mutate_func.as_ref().unwrap_or(&default_raw_mutate_func)
    }
}

fn default_raw_mutate_func(
    data: &mut Vec<u8>, 
    // probably need to enrichen the `raw_mutate_func` signature with
    // a seed and info about whether we are only shrinking...
) {
    // ... the stuff you are doing for data segments right now ...
}

// ...

// Inside the data segment mutator.
(config.raw_mutate_func())(&mut my_data_segment);

Does that make sense?

Also if you're imagining that this is always an option should the CLI be the only one that implements this? By default would this mutator be disabled unless the callback were configured?

I would say the other way around: the CLI shouldn't expose this functionality because what different kinds of implementations can it have? We're probably not going to pipe out to another program to mutate these little bits of data, so there isn't really much to configure other than maybe the source of entropy. Not really useful here.

However, when using wasm-mutate as a library inside a fuzz target, we definitely want to hook this up to https://docs.rs/libfuzzer-sys/latest/libfuzzer_sys/fn.fuzzer_mutate.html so that we can let libfuzzer use its dictionary and all of its builtin mutation schemes when mutating these data segments or export names or what have you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I was mixing up sources of randomness and got confused with Unstructured vs just a plain old Rng here.

Also thanks for explaining! I've pushed up something which I think roughly does what you're thinking, but I think it'd be good to double-check as well.

In an effort to minimize modules this commit adds a mutator which will
modify data segments to either add or remove bytes (although only
removal is used when reducing). The main meat of this commit is
refactoring out the translation from `wasmparser` types to `wasm-encoder`
types into a shared module for both the `RemoveItem` mutator and this
new mutator to use.
Also make the rename export mutator a bit more robust
Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r=me with max_size parameter

thanks for your patience on these back and forths!

fn raw_mutate(&mut self, data: &mut Vec<u8>) -> Result<()> {
// If a raw mutation function is configured then that's prioritized.
if let Some(mutate) = &self.raw_mutate_func {
return mutate(data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to pass a max_size parameter through to the raw mutation function. (a) libfuzzer_sys::fuzzer_mutate requires such a parameter, and (b) it lets us set max_size == data.len() - 1 when we are configured to only do size-reducing mutations. When we aren't shrinking, then we can either pass a max_size of data.capacity() or data.len() * 2 or something like that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! I threaded this through from the configuration of the pass itself, since the renaming of exports was already being configured with a maximum size.

@alexcrichton alexcrichton merged commit a0bb80d into bytecodealliance:main Jan 21, 2022
@alexcrichton alexcrichton deleted the modify-data branch January 21, 2022 20:58
alexcrichton added a commit to alexcrichton/wasm-tools that referenced this pull request Jan 24, 2022
This fixes an issue from bytecodealliance#447 where the renaming export pass was
mistakenly generating a duplicate export name because I forgot to add a
check. The check is added here, however!
alexcrichton added a commit that referenced this pull request Jan 25, 2022
* wasm-mutate: Fix generating duplicate export names

This fixes an issue from #447 where the renaming export pass was
mistakenly generating a duplicate export name because I forgot to add a
check. The check is added here, however!

* Prevent an infinite loop

* Consume fuel when generating random names.
* Also allow generating the same name as before, notably the empty name.
abrown added a commit to abrown/wasm-tools that referenced this pull request Feb 7, 2025
This encoding change is necessary due to recent additions to the
component model; see [bytecodealliance#447].

[bytecodealliance#447]: WebAssembly/component-model#447
abrown added a commit to abrown/wasm-tools that referenced this pull request Feb 7, 2025
[bytecodealliance#447] tries to make the built-in naming a bit more consistent; this
change propagates that here.

[bytecodealliance#447]: WebAssembly/component-model#447
abrown added a commit to abrown/wasm-tools that referenced this pull request Feb 7, 2025
This encoding change is necessary due to recent additions to the
component model; see [bytecodealliance#447].

[bytecodealliance#447]: WebAssembly/component-model#447
abrown added a commit to abrown/wasm-tools that referenced this pull request Feb 7, 2025
[bytecodealliance#447] tries to make the built-in naming a bit more consistent; this
change propagates that here.

[bytecodealliance#447]: WebAssembly/component-model#447
abrown added a commit to abrown/wasm-tools that referenced this pull request Mar 17, 2025
[bytecodealliance#447] tries to make the built-in naming a bit more consistent; this
change propagates that here.

[bytecodealliance#447]: WebAssembly/component-model#447
abrown added a commit to abrown/wasm-tools that referenced this pull request Mar 17, 2025
This encoding change is necessary due to recent additions to the
component model; see [bytecodealliance#447].

[bytecodealliance#447]: WebAssembly/component-model#447
abrown added a commit to abrown/wasm-tools that referenced this pull request Mar 17, 2025
[bytecodealliance#447] tries to make the built-in naming a bit more consistent; this
change propagates that here.

[bytecodealliance#447]: WebAssembly/component-model#447
abrown added a commit to abrown/wasm-tools that referenced this pull request Mar 17, 2025
This encoding change is necessary due to recent additions to the
component model; see [bytecodealliance#447].

[bytecodealliance#447]: WebAssembly/component-model#447
github-merge-queue bot pushed a commit that referenced this pull request Mar 17, 2025
* threads: add `thread.spawn_indirect`

As discussed in [#89], this adds support for a new intrinsic,
`thread.spawn_indirect`. This new operation would allow spawning a
shared function stored in a table via a table index.

This leaves some future work undone:
- `thread.spawn` could/should be renamed to `thread.spawn_ref`
- `thread.spawn_indirect` could/should take the encoding byte from
  `thread.hw_concurrency`--swap `0x07` for `0x06`
- importantly, `thread.spawn_indirect` should gain a field indicating
  which type to expect in the indirect table; due to current limitations
  in `wasm-tools`, the locations to check once this is possible are
  marked with `TODO: spawn indirect types`.

[#89]: WebAssembly/shared-everything-threads#89

* threads: rename `thread.spawn` to `thread.spawn_ref`

[#447] tries to make the built-in naming a bit more consistent; this
change propagates that here.

[#447]: WebAssembly/component-model#447

* threads: move `thread.*` canonical opcodes to `0x40+`

This encoding change is necessary due to recent additions to the
component model; see [#447].

[#447]: WebAssembly/component-model#447

* threads: add function types to `thread.spawn_indirect`

Initially I implemented `thread.spawn_indirect` without the ability to
check the type of the function to spawn out of an abundance of caution
(see the implementation issues described in [#89]). In the process of
writing out the specification, we convinced ourselves that these
problems should not apply to `thread.spawn_indirect`.

This change adds the function type index necessary for doing some extra
validation of `thread.spawn_indirect` and adds some tests related to
this. One unimplemented TODO is what to do about shared tables:
technically, the table used by a `thread.spawn_indirect` should be
shared but we have so far prevented this in the component model; this
can be resolved later, though.

[#89]: WebAssembly/shared-everything-threads#89

* review: simplify WAST parsing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants