Skip to content

Conversation

maflcko
Copy link
Member

@maflcko maflcko commented Jan 2, 2021

The current switch (fuzzed_data_provider.ConsumeIntegralInRange<int>(0, nn)) { case 0: ... case 1: ... case nn: ... has several problems:

  • It makes it hard to review newly added targets, because it requires manual counting of cases
  • It makes it hard to update a target, because updating all case labels is trivial, but tedious to review and causes merge conflicts
  • Updating the target raises the question whether the case labels should be preserved to not invalidate the existing fuzz inputs format. Fuzz input format might already change implicitly on every commit, so this isn't something worthwhile to pursue. Edit: This pull doesn't fix this problem.

Fix all issues by adding a new CallOneOf helper

@maflcko
Copy link
Member Author

maflcko commented Jan 2, 2021

I'll update the remaining fuzz targets once this has one or two Concept ACKs

@hebasto
Copy link
Member

hebasto commented Jan 2, 2021

I'll update the remaining fuzz targets once this has one or two Concept ACKs

Concept ACK.

* It makes it hard to update a target, because updating all case labels is trivial, but tedious to review and causes merge conflicts

It will be tedious to review only this pull :)

@theStack
Copy link
Contributor

theStack commented Jan 2, 2021

Concept ACK!

It will be tedious to review only this pull :)

With ignored whitespace (-w or --ignore-all-space) the diff actually looks quite straight-forward to review.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jan 2, 2021

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@ajtowns
Copy link
Contributor

ajtowns commented Jan 3, 2021

Concept ACK.

Where a case statement previously had a return; this will need to have an abort_loop = true; escape instead, which seems like no big deal.

I'm not a fan of "Fuzz input format might already change implicitly on every commit, so this isn't something worthwhile to pursue." -- changing the fuzz input format invalidates the code coverage of the qa assets corpus, so it seems like we should try to minimise fuzz input format changes to me. But I think this makes dealing with that easier anyway: to have it behave stably when cases are added or removed I think you could do something like:

    const size_t call_index{ConsumeIntegral<size_t>}();
    if (call_index < size) return callables[call_index]();

and have an auto noop = [] () deleted_case { return; } place holder for deleting cases that aren't at the end?

I was kind of expecting to see a recursive template function here -- something like:

    template<typename... Cases>
    void CallOneOf(size_t selection, Cases... cases);
    template<typename T, typename... Cases>
    void CallOneOf(size_t selection, T case0, Cases... cases)
    {
        if (selection == 0) return case0();
        else return CallOneOf(selection - 1, cases...);
    }
    template<>
    void CallOneOf(size_t selection) { return; }

    template<typename... Cases>
    void CallOneOf(FDP& fdp, Cases... cases) {
        size_t call_index = fdp.ConsumeIntegral<size_t>();
        return CallOneOf(call_index, cases...);
    }

Not sure the selection - 1 stuff would get optimised into a switch equivalent though, but it might be worth checking if it's any better than the std::function indirection.

@maflcko
Copy link
Member Author

maflcko commented Jan 3, 2021

@maflcko
Copy link
Member Author

maflcko commented Jan 3, 2021

Where a case statement previously had a return; this will need to have an abort_loop = true; escape instead, which seems like no big deal.

Good point, but a return at that scope generally means that fuzzing will end for the current input. This is very rare and the only return I could find inside a switch was a return inside a CConnman::ForEach lambda, which obviously needs to be kept as is. Conversely, and more importantly for review here, any break inside a switch that isn't inside another loop or switch will need to be replace by return.

I'm not a fan of "Fuzz input format might already change implicitly on every commit, so this isn't something worthwhile to pursue." -- changing the fuzz input format invalidates the code coverage of the qa assets corpus, so it seems like we should try to minimise fuzz input format changes to me. But I think this makes dealing with that easier anyway: to have it behave stably when cases are added or removed I think you could do something like:

Ok, I see that my pull doesn't solve the question whether seeds should be invalidated or not. Maybe it is best to answer this in a separate discussion. Before it was possible to invalidate or not invalidate seeds on a case-by-case basis and with this pull, it is possible, too.

Edit: Let's discuss this in #20837 .

I was kind of expecting to see a recursive template function here -- something like:

The recursive template would invalidate all seeds, no? This pull is tagged with "refactoring", so the goal is that behavior doesn't change. Edit. No it doesn't change the format if done correctly.

@maflcko maflcko force-pushed the 2012-fuzzCallOneOf branch 3 times, most recently from 0afca24 to fa12a5d Compare January 3, 2021 11:43
@maflcko
Copy link
Member Author

maflcko commented Jan 3, 2021

Removed std::function indirection as requested by @ajtowns . Thanks!

@jnewbery
Copy link
Contributor

jnewbery commented Jan 3, 2021

Strong concept ACK. Thanks for doing this! Merge conflicts in the fuzz files seem to be very common, so eliminating the majority of them in this way would be a big win.

@jonatack
Copy link
Member

jonatack commented Jan 4, 2021

For a second I thought you were adding Call of Duty into bitcoin. This is a great idea. Concept ACK.

@laanwj
Copy link
Member

laanwj commented Jan 7, 2021

This is much more elegant! Concept/quick skim code review ACK.

@practicalswift
Copy link
Contributor

Concept ACK: much better! Thanks!

Will review.

Can be reviewed with --ignore-all-space
@maflcko maflcko force-pushed the 2012-fuzzCallOneOf branch from fa12a5d to fa75d40 Compare January 11, 2021 09:38
Copy link
Contributor

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK.

const size_t call_index{fuzzed_data_provider.ConsumeIntegralInRange<size_t>(0, call_size - 1)};

size_t i{0};
return ((i++ == call_index ? callables() : void()), ...);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL.

Copy link
Contributor

@jnewbery jnewbery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent. Thanks for doing this @MarcoFalke!

utACK fa75d40

@maflcko
Copy link
Member Author

maflcko commented Jan 11, 2021

For a second I thought you were adding Call of Duty into bitcoin. This is a great idea. Concept ACK.

Call of Duty is already being fuzzed: https://twitter.com/NedWilliamson/status/1348432573836349441

@practicalswift
Copy link
Contributor

practicalswift commented Jan 13, 2021

  • Updating the target raises the question whether the case labels should be preserved to not invalidate the existing fuzz inputs format. Fuzz input format might already change implicitly on every commit, so this isn't something worthwhile to pursue. Edit: This pull doesn't fix this problem.

When using the switch (fuzzed_data_provider.ConsumeIntegralInRange<int>(0, N)) idiom it is worth noting that every time N is increased by one to accommodate for a new case all existing seeds are invalidated.

Example (all using the same fixed seed input in the form of a finite scream AAAAAAAAAAAAAAAAAAAAAAAAA):

FuzzedDataProvider{buffer.data(), buffer.size()}.ConsumeIntegralInRange<int>(0, 1) == 1
FuzzedDataProvider{buffer.data(), buffer.size()}.ConsumeIntegralInRange<int>(0, 2) == 2
FuzzedDataProvider{buffer.data(), buffer.size()}.ConsumeIntegralInRange<int>(0, 3) == 1
FuzzedDataProvider{buffer.data(), buffer.size()}.ConsumeIntegralInRange<int>(0, 4) == 0
FuzzedDataProvider{buffer.data(), buffer.size()}.ConsumeIntegralInRange<int>(0, 5) == 5
FuzzedDataProvider{buffer.data(), buffer.size()}.ConsumeIntegralInRange<int>(0, 6) == 2
FuzzedDataProvider{buffer.data(), buffer.size()}.ConsumeIntegralInRange<int>(0, 7) == 1
FuzzedDataProvider{buffer.data(), buffer.size()}.ConsumeIntegralInRange<int>(0, 8) == 2
FuzzedDataProvider{buffer.data(), buffer.size()}.ConsumeIntegralInRange<int>(0, 9) == 5
FuzzedDataProvider{buffer.data(), buffer.size()}.ConsumeIntegralInRange<int>(0, 10) == 10
FuzzedDataProvider{buffer.data(), buffer.size()}.ConsumeIntegralInRange<int>(0, 11) == 5

One trick that can be used to tackle this is to choose from a larger range such as [0, 32] even if we only have say 12 case:s ([0, 11]). The non-matching numbers [12, 32] will simply be "ignored" by the coverage-guided fuzzer.

That way we'll only have to invalidate existing seeds when we've exhausted the entire range [0, 32] with matching case:s. Then we can bump to [0, 64], and so on.

In other words: instead of invalidating every time we add a new case we'll be invalidating every some-power-of-2:th time we add a new case :)

Perhaps that strategy could be incorporated in this PR?

@maflcko
Copy link
Member Author

maflcko commented Jan 13, 2021

Perhaps that strategy could be incorporated in this PR?

The pr is tagged with "refactoring", so behaviour changes are not allowed. The suggestion can be discussed in #20837 and addressed in a follow-up pull request. The changes should be trivial based on this pr.

@ajtowns
Copy link
Contributor

ajtowns commented Jan 14, 2021

ACK fa75d40 - code review only

@maflcko maflcko merged commit 29d2aeb into bitcoin:master Jan 14, 2021
@maflcko maflcko deleted the 2012-fuzzCallOneOf branch January 14, 2021 10:12
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jan 15, 2021
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Aug 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants