-
Notifications
You must be signed in to change notification settings - Fork 37.7k
bench: add "priority level" to the benchmark framework #26158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick code review ACK
I think the default should be to run them all?
|
0106da0
to
f95ce9b
Compare
Feedback tackled, thanks both Changes:
|
I like it, code review ACK f95ce9b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK - looks like a good improvement to allow make check
not become too slow and still allow flexibility as to which benchmarks we want to add to the test suite.
f01077a
to
23b6a7a
Compare
Updated per feedback. Pairs unpacked, small diff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went through most of the code now, a few more mostly minor comments.
nit: From a readability pov, I'm a bit uneasy with priority_level
sometimes being a uint8_t
and sometimes a PriorityLevel
. Not very intuitive, but I don't have a clear solution atm. (I do like the bitmask approach).
4cfc82b
to
6e95eae
Compare
Updated per feedback, thanks @stickies-v 👍🏼. Changes diff:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review ACK 6e95eae
Will do some testing shortly.
6e95eae
to
19ac657
Compare
19ac657
to
9c3ab93
Compare
Feedback tackled, thanks both. Changes (diff):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some first testing already, everything seems to work as expected (except for -priority-level=all
not working anymore as commented).
I'm not sure we want to do something about it, but currently we can also specify a benchmark as benchmark::PriorityLevel::ALL
, in which case it would get run for any -priority-level
.
E.g. with
BENCHMARK(AddrManAdd, benchmark::PriorityLevel::LOW);
BENCHMARK(AddrManSelect, benchmark::PriorityLevel::MEDIUM);
BENCHMARK(AddrManGetAddr, benchmark::PriorityLevel::HIGH);
BENCHMARK(AddrManAddThenGood, benchmark::PriorityLevel::ALL);
For ./bench_bitcoin --priority-level=low
we get:
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 19,047,208.00 | 52.50 | 1.8% | 0.21 | `AddrManAdd`
| 48,141,333.00 | 20.77 | 1.8% | 0.53 | `AddrManAddThenGood`
a0c486a
to
cbc077e
Compare
06664b8
to
010f53e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-ACK 010f53e
ACK 010f53e |
Is there a guideline on how to decide the priority level? I'd presume this is subjective, or is it simply a performance question? That is, slow benchmarks have low priority? |
Yeah ok. So far, the turning point here seems to be introduction of benchmarks who measure/test how certain flows behave on a large scale (context #25685 (comment)). Which, usually, require a somewhat slow context setup (e.g. fill-up the wallet with 20k utxo to benchmark the tx creation process, coin selection, etc) which goes outside of the scope of the "run all benchs" sanity check that we are currently doing in |
Ok, so this is similar to what the functional tests call "extended"? I am asking because right now it looks like this doesn't change the "priority" of any benchmark, and if one is changed we should have at least one CI task that runs all benchmarks, similar to how there is one that runs all functional tests. |
Yeah, it's similar to the "extended" concept there. The goal is to not only be restricted to have fast and small benchmarks due the And yeah, this PR does not change the current behavior in any way. Only introduces the filtering by priority functionality and, by default, makes absolutely, a CI task to run them all would be nice, yeah. |
Side note, to make the priority level decision process simpler, could also remove the |
I was also thinking about that decision process where someone is deciding the priority level of his new benchmark. How to differentiate between "medium" and 'low"? Do you have any scenarios/thoughts for that? Instead of 2 priorities, maybe we could go with the "extended" notion that functional tests use? |
I think that it depends on the benchmarked area mostly. E.g. If we are benchmarking a process from the validation code (like the duplicate-inputs benchmark), it should have higher priority than a benchmark that tests the
Could go with that terminology as well, yeah. But, I think that, if we define which areas, problems and time constrains are more suited for each priority level, then the different levels approach make more sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code-review ACK 010f53e ⛰️
Left two small nits below, but feel free to ignore. Happy to also re-review if you decide to change the priorities numbers/naming ("low"/"high" or "normal"/"extended" or whatever).
Will allow us to run certain benchmarks while skip non-prioritized ones in 'make check'.
no-functional changes. Only have set the priority level explicitly on every BENCHMARK macro call.
so we have a cleaner exit on internal runtime errors. e.g. an unknown priority level.
010f53e
to
3e9d0be
Compare
Updated per feedback. Thanks everyone :). Have removed the Tiny diff. Ready to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-ACK 3e9d0be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-ACK 3e9d0be
re-ACK 3e9d0be |
ACK 3e9d0be |
@echo "Running bench/bench_bitcoin (one iteration sanity check)..." | ||
$(BENCH_BINARY) --sanity-check > /dev/null | ||
@echo "Running bench/bench_bitcoin (one iteration sanity check, only high priority)..." | ||
$(BENCH_BINARY) -sanity-check -priority-level=high > /dev/null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the "Win64 native [vs2022]" CI task the same -priority-level=high
has been added in #26481.
This is from today's meeting, a simple "priority level" for the benchmark framework.
Will allow us to run certain benchmarks while skip non-prioritized ones in
make check
.By default,
bench_bitcoin
will run all the benchmarks.make check
will only run the high priority ones,and have marked all the existent benchmarks as "high priority" to retain the current behavior.
Could test it by modifying any benchmark priority to something different from "high", and
run
bench_bitcoin -priority-level=high
and/orbench_bitcoin -priority-level=medium,low
(the first command will skip the modified bench while the second one will include it).
Note: the second commit could be avoided by having a default arg value for the priority
level but.. an explicit set in every
BENCHMARK
macro call makes it less error-prone.