-
Notifications
You must be signed in to change notification settings - Fork 37.7k
fuzz: Generate with random libFuzzer settings #28178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. |
fa23bec
to
fae4ff3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK.
Also, randomize -mutate_depth, for fun.
Could you expand?
Not sure. I am happy to drop this, but it was part of https://www.github.com/bitcoin/bitcoin/pull/20752/commits/1ff0dc525f051bbc7a93312dd622340ca8f4f52c, which is why I picked it up. |
Not sure. I am happy to drop this
No, i was just curious about the rationale. I guess i've a mild preference for sticking to defaults if there isn't any, but i don't think it hurts either.
------- Original Message -------
…On Friday, July 28th, 2023 at 11:07 AM, MacrabFalke ***@***.***> wrote:
> Concept ACK.
>
>> Also, randomize -mutate_depth, for fun.
>
> Could you expand?
Not sure. I am happy to drop this, but it was part of https://www.github.com/bitcoin/bitcoin/pull/20752/commits/1ff0dc525f051bbc7a93312dd622340ca8f4f52c, which is why I picked it up.
—
Reply to this email directly, [view it on GitHub](#28178 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/AFLK3FY4TZRP6W4Z5VBASP3XSN6ODANCNFSM6AAAAAA23GF5QY).
You are receiving this because you commented.Message ID: ***@***.***>
|
fae4ff3
to
fad32eb
Compare
Thanks, generated some data and edited the description to support the |
command = [ | ||
os.path.join(build_dir, 'src', 'test', 'fuzz', 'fuzz'), | ||
"-runs=100000", | ||
"-max_total_time=6000", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a large increase in runtime isn't it? Won't 100k iterations take much less than 100 minutes to complete on most of our targets currently?
We've got 115 169 targets at the moments, this change makes it so running this script with --generate
for all targets would take around 8 days more than a dozen of days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of tasks that are spawned is ~300 (294
for me, but it will likely be more in the future). So even if you run each task only for a minute, it will take longer than 4 hours already. But running for a minute seems entirely useless, so I am not sure if it is worth optimizing for. (For smoke testing I just edit this script to -max_total_time=6
).
In any way, you can pass --par 999
to run everything at the same time and be done in 100 minutes.
However, to avoid an edit for smoke testing, someone can add a config option in a follow-up or separate pull.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok i'm confused. I'm aware i just gave this a superficial look but i'm absolutely not following here. (Last comment and if i don't get it i'll look into it more in depth.)
The number of tasks that are spawned is ~300
How comes? The default for --par
is 4
.
So even if you run each task only for a minute, it will take longer than 4 hours already
How could, with 300 tasks, running 169 tasks for 1 minute take 4 hours?
My main worry would be that increasing the runtime of the script by, i don't know, 100X, would make it so people would always stop it before it completes thereby never generating seeds for the last (alphabetically-sorted) targets.
I may be missing underestimating how many tasks people usually spawn when running this script though. But --par 169
sounds like a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How comes?
Sorry, with "The number of tasks that are spawned is ~300" I meant "The number of tasks that are spawned over the full runtime of the python script is ~300".
How could
For example, with -j1
: Running 300 1-minute-tasks spawned sequentially on one thread will take 300 minutes. (5 hours). Though, the main point is that 1 minute of runtime is useless. So optimizing for a short default runtime will quickly get you to make the entire script useless.
I am happy to change the number or drop this change, but generally I don't think it will be possible to find a default number that makes everyone happy. There will always be a need to change the number some way or another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed a too short runtime could make the script useless (since people would basically start running it in a loop itself), but a too long one would likely introduce a bias toward the first (alphabetically sorted) targets.
I guess it would be helpful to know who uses this script and how. For instance if they are continuously generating coverage most likely they are already running it in a loop of some sort and therefore a runtime of like 10minutes per target would spead the time spent over all targets instead of generating more for the first ones.
But i'll stop bikeshedding now, your change looks good. If someone wants to change it they can open a PR and we'll learn more about how the script is being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@murchandamus may be using it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using this script, but got away from that because it always tries to run all fuzz targets. I would either only be able to fuzz each target very briefly of which a big portion would just be the initialization, or if I were to kill the fuzzing when I need my computer for other things, always fuzz the same targets due to its determinism. I recently ended up making myself a bash script instead that randomly picks ten fuzz targets and fuzzes them for one hour each (with 12 processes in parallel). I run this nightly per a cronjob:
for i in $(find ../qa-assets-active-fuzzing/fuzz_seed_corpus/ -mindepth 1 -maxdepth 1 -type d | shuf | head -n10); do FUZZ=$(basename $i) src/test/fuzz/fuzz -jobs=12 -reload=1 -max_total_time=3600 $i; done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy to review a pull request to change from running all targets to run N
randomly drawn targets. However, I don't need the feature, so I won't be working on this personally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, the change to max_total_time seems fine, given that https://github.com/bitcoin/bitcoin/pull/28178/files#r1346327126 also does it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK fad32eb. I'm not using this script myself but the changes make sense to me.
command = [ | ||
os.path.join(build_dir, 'src', 'test', 'fuzz', 'fuzz'), | ||
"-runs=100000", | ||
"-max_total_time=6000", | ||
"-reload=0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, do you happen to know why they don't only set it to 1
when -jobs
is not 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to launch multiple libFuzzer manually on the same folder, but this script doesn't do it, and it seems unlikely that a user would do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I’ve done that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, most have done it, but have you done it while this script is running? :)
I think I'd prefer continuing to use the defaults but I also don't use the runner much for generating, so no super strong opinion. I would assume that the defaults are based on data/heuristics and probably work well most of the time, so maybe make this optional?
Wouldn't this only make sense if the "help" outweighs the "hurt"? For example, the metric you used in [1] is "iterations until bug found", so if randomizing the fuzzer options results in roughly equal speedup (help) and slow down (hurt) then there is no gain on average. |
Except for I'd say the main goal here is to prevent a case where a bug isn't found at all (or only after a "outlier" long time) due to the default settings. I don't think we've seen such a bug in reality, so I am happy to close this pull. Regardless, |
This sounds reasonable but also quite rare. Could you give me an example of a theoretical target where this could happen? Are there any other projects that do this (e.g. oss-fuzz)?
On a general note, I think it is fine to add to/improve on our fuzzing tools & techniques even if those haven't found any bugs yet but there should be some evidence that they are useful.
For |
IIRC oss-fuzz did that with AFL build time settings, but removed it again because it would turn up bugs intermittently/non-deterministically. I don't think they do it for libFuzzer runtime settings, but I found this, which claims that exponential search is turned into linear search.
I think I provided some evidence, but I guess you are asking about evidence that they are useful in the average case, which is a bit harder to show. |
ACK fad32eb I would use this script instead of my hacky bash script, if there were a way of running just |
like this, @murchandamus? diff --git a/test/fuzz/test_runner.py b/test/fuzz/test_runner.py
index a60f51f40a..6510102e6c 100755
--- a/test/fuzz/test_runner.py
+++ b/test/fuzz/test_runner.py
@@ -80,6 +80,13 @@ def main():
' the given targets for a finite number of times. Outputs them to'
' the passed corpus_dir.'
)
+ parser.add_argument(
+ '--max_targets',
+ '-mt',
+ type=int,
+ default=0,
+ help='Max number of targets to run (chosen randomly). 0 means to run all of them.',
+ )
args = parser.parse_args()
args.corpus_dir = Path(args.corpus_dir)
@@ -121,6 +128,8 @@ def main():
logging.error("Target \"{}\" not found in current target list.".format(excluded_target))
continue
test_list_selection.remove(excluded_target)
+ if args.max_targets > 0:
+ test_list_selection = random.sample(test_list_selection, args.max_targets)
test_list_selection.sort()
logging.info("{} of {} detected fuzz target(s) selected: {}".format(len(test_list_selection), len(test_list_all), " ".join(test_list_selection))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any strong opinion on this. Although ramdomizing it may make sense for me. In practice, I couldn't see an effective case. I'm using mutation testing to see if I get any interesting stuff with this, I will have a better opinion soon.
🤔 There hasn't been much activity lately and the CI seems to be failing. If no one reviewed the current pull request by commit hash, a rebase can be considered. While the CI failure may be a false positive, the CI hasn't been running for some time, so there may be a real issue hiding as well. A rebase triggers the latest CI and makes sure that no silent merge conflicts have snuck in. |
fad32eb
to
fa4e396
Compare
Removed the max_len and mutate_depth randomization, because it seemed too controversial? Removed this from OP: Also, randomize Also, randomize [2] #27888 (comment) This picks up the work started in commit https://www.github.com/bitcoin/bitcoin/pull/20752/commits/1ff0dc525f051bbc7a93312dd622340ca8f4f52c |
This is set by merge, so set it here as well, to avoid OOM.
Also, added missing rss limit to avoid OOM |
cc @dergoegge are you still unconvinced about this? Is there more references or data I can provide? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK fa3a410
cc @murchandamus You may be interested in the rss_limit_mb (last commit). Not sure if you set it in your fuzz script, or if you ever ran into OOM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
light ACK fa3a410
I did a light test of these values using differential fuzzing and mutation testing. I applied differential fuzzing between the original coin selection functions and their mutants. In general:
- Running from seed corpus,
-runs=100000
was able to kill most mutants (>55%) but-max_total_time=6000
is so much more effective (>90%), as expected. - Running without seed corpus,
-runs=100000
was useless,-max_total_time=6000
was effective. - Running from seed corpus,
-use_value_profile=1
reduced the time to kill some mutants in 55%. Especially for changes like ">" to >=" and similars.
Running only fuzzing, even knowing it is not so effective in killing mutants, I applied the mutations in part of the code that could impact the target. Results:
- Running without seed corpus,
-runs=100000
was useless,-max_total_time=6000
could kill some mutants. - Running from seed corpus,
-use_value_profile=1
increased the number of killed mutants in 12% (running with-max_total_time=6000
).
Thanks, memory has not been an issue for me. |
utACK fa3a410 |
fa3a410 fuzz: Set -rss_limit_mb=8000 for generate as well (MarcoFalke) fa4e396 fuzz: Generate with random libFuzzer settings (MarcoFalke) Pull request description: Sometimes a libFuzzer setting like `-use_value_profile=1` helps [0], sometimes it hurts [1]. [0] bitcoin#20789 (comment) [1] bitcoin#27888 (comment) By picking a random value, it is ensured that at least some of the runs will have the beneficial configuration set. Also, set `-max_total_time` to prevent slow fuzz targets from getting a larger time share, or possibly peg to a single core for a long time and block the python script from exiting for a long time. This can be improved in the future. For example, the python script can exit after some time (bitcoin#20752 (comment)). Alternatively, it can measure if coverage progress was made and run for less time if no progress has been made recently anyway, so that more time can be spent on targets that are new or still make progress. ACKs for top commit: murchandamus: utACK fa3a410 dergoegge: utACK fa3a410 brunoerg: light ACK fa3a410 Tree-SHA512: bfd04a76ca09aec612397bae5f3f263a608faa7087697169bd4c506c8195c4d2dd84ddc7fcd3ebbc75771eab618fad840af819114968ca3668fc730092376768
fa3a410 fuzz: Set -rss_limit_mb=8000 for generate as well (MarcoFalke) fa4e396 fuzz: Generate with random libFuzzer settings (MarcoFalke) Pull request description: Sometimes a libFuzzer setting like `-use_value_profile=1` helps [0], sometimes it hurts [1]. [0] bitcoin#20789 (comment) [1] bitcoin#27888 (comment) By picking a random value, it is ensured that at least some of the runs will have the beneficial configuration set. Also, set `-max_total_time` to prevent slow fuzz targets from getting a larger time share, or possibly peg to a single core for a long time and block the python script from exiting for a long time. This can be improved in the future. For example, the python script can exit after some time (bitcoin#20752 (comment)). Alternatively, it can measure if coverage progress was made and run for less time if no progress has been made recently anyway, so that more time can be spent on targets that are new or still make progress. ACKs for top commit: murchandamus: utACK fa3a410 dergoegge: utACK fa3a410 brunoerg: light ACK fa3a410 Tree-SHA512: bfd04a76ca09aec612397bae5f3f263a608faa7087697169bd4c506c8195c4d2dd84ddc7fcd3ebbc75771eab618fad840af819114968ca3668fc730092376768
b70e091 Merge bitcoin#29667: fuzz: actually test garbage >64b in p2p transport test (fanquake) 6d7aa3d Merge bitcoin#29497: test: simplify test_runner.py (fanquake) d0e15d5 Merge bitcoin#29606: refactor: Reserve memory for ToLower/ToUpper conversions (Ava Chow) 045fa5f Merge bitcoin#29514: tests: Provide more helpful assert_equal errors (Ava Chow) bd607f0 Merge bitcoin#29393: i2p: log connection was refused due to arbitrary port (Ava Chow) c961755 Merge bitcoin#29595: doc: Wrap flags with code in developer-notes.md (fanquake) 8d6e5e7 Merge bitcoin#29583: fuzz: Apply fuzz env (suppressions, etc.) when fetching harness list (fanquake) 4dce690 Merge bitcoin#29576: Update functional test runner to return error code when no tests are found to run (fanquake) 910a7d6 Merge bitcoin#29529: fuzz: restrict fopencookie usage to Linux & FreeBSD (fanquake) fdac2b3 Merge bitcoin#29493: subtree: update crc32c subtree (fanquake) a23b342 Merge bitcoin#29475: doc: Fix Broken Links (fanquake) 92bad90 Merge bitcoin#28178: fuzz: Generate with random libFuzzer settings (fanquake) 9b6a05d Merge bitcoin#29443: depends: fix BDB compilation on OpenBSD (fanquake) 9963e6b Merge bitcoin#29413: fuzz: increase length of string used for `NetWhitelist{bind}Permissions::TryParse` (fanquake) 3914745 Merge bitcoin#29425: test: fix intermittent failure in wallet_reorgrestore.py (fanquake) b719883 Merge bitcoin#29399: test: Fix utxo set hash serialisation signedness (fanquake) f096880 Merge bitcoin#29377: test: Add makefile target for running unit tests (Ava Chow) 03e0bd3 Merge bitcoin#27319: addrman, refactor: improve stochastic test in `AddSingle` (Ava Chow) Pull request description: ## Issue being fixed or feature implemented Batch of trivial backports ## What was done? See commits ## How Has This Been Tested? built locally; large combined merge passed tests locally ## Breaking Changes Should be none ## Checklist: _Go over all the following points, and put an `x` in all the boxes that apply._ - [ ] I have performed a self-review of my own code - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have added or updated relevant unit/integration/functional/e2e tests - [ ] I have made corresponding changes to the documentation - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_ ACKs for top commit: UdjinM6: utACK b70e091 knst: utACK b70e091 Tree-SHA512: 659a931f812c8a92cf3854abb873d92885219a392d6aa8e49ac4b27fe41e8e163ef9a135050e7c2e1bd33cecd2f7dae215e05a9c29f62e837e0057d3c16746d6
Sometimes a libFuzzer setting like
-use_value_profile=1
helps [0], sometimes it hurts [1].[0] #20789 (comment)
[1] #27888 (comment)
By picking a random value, it is ensured that at least some of the runs will have the beneficial configuration set.
Also, set
-max_total_time
to prevent slow fuzz targets from getting a larger time share, or possibly peg to a single core for a long time and block the python script from exiting for a long time. This can be improved in the future. For example, the python script can exit after some time (#20752 (comment)). Alternatively, it can measure if coverage progress was made and run for less time if no progress has been made recently anyway, so that more time can be spent on targets that are new or still make progress.