-
Notifications
You must be signed in to change notification settings - Fork 37.7k
bench: Add support for measuring CPU cycles #9202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This adds cycle min/max/avg to the statistics. Supported on x86 and x86_64 (natively through rdtsc), as well as Linux (perf syscall).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested ACK (OSX) 3532818
Result with -02 on OSX (2.6 GHz Intel Core i7)
#Benchmark,count,min,max,average,min_cycles,max_cycles,average_cycles
Base58CheckEncode,229376,0.000003975452273,0.000005225097993,0.000004511387877,10312,13553,11703
Base58Decode,851968,0.000001059088390,0.000001410629920,0.000001215919978,2747,3659,3154
Base58Encode,327680,0.000002935426892,0.000003485380148,0.000003217919584,7617,9042,8347
CCoinsCaching,90112,0.000009148381650,0.000012961449102,0.000011695591225,23730,33622,30338
CoinSelection,416,0.002168059349060,0.002760812640190,0.002422756873644,5623936,7161456,6284967
DeserializeAndCheckBlockTest,72,0.013411879539490,0.015962481498718,0.014648040135701,34790547,41406927,37996977
DeserializeBlockTest,88,0.010604500770569,0.012940049171448,0.011376557025042,27508165,33566117,29512372
LockedPool,512,0.001808419823647,0.003033317625523,0.002045841421932,4691069,7868411,5307190
MempoolEviction,15360,0.000059797894210,0.000087032094598,0.000065140891820,155116,225747,168975
RIPEMD160,384,0.002612933516502,0.002894565463066,0.002725711092353,6777939,7508452,7070852
RollingBloom-refresh,1,0.000611000000000,0.000611000000000,0.000611000000000
RollingBloom-refresh,1,0.000105000000000,0.000105000000000,0.000105000000000
RollingBloom-refresh,1,0.000101000000000,0.000101000000000,0.000101000000000
RollingBloom-refresh,1,0.000097000000000,0.000097000000000,0.000097000000000
RollingBloom-refresh,1,0.000113000000000,0.000113000000000,0.000113000000000
RollingBloom-refresh,1,0.000096000000000,0.000096000000000,0.000096000000000
RollingBloom-refresh,1,0.000096000000000,0.000096000000000,0.000096000000000
RollingBloom-refresh,1,0.000099000000000,0.000099000000000,0.000099000000000
RollingBloom-refresh,1,0.000096000000000,0.000096000000000,0.000096000000000
RollingBloom-refresh,1,0.000108000000000,0.000108000000000,0.000108000000000
RollingBloom-refresh,1,0.000128000000000,0.000128000000000,0.000128000000000
RollingBloom-refresh,1,0.000094000000000,0.000094000000000,0.000094000000000
RollingBloom-refresh,1,0.000151000000000,0.000151000000000,0.000151000000000
RollingBloom-refresh,1,0.000095000000000,0.000095000000000,0.000095000000000
RollingBloom-refresh,1,0.000106000000000,0.000106000000000,0.000106000000000
RollingBloom-refresh,1,0.000124000000000,0.000124000000000,0.000124000000000
RollingBloom-refresh,1,0.000115000000000,0.000115000000000,0.000115000000000
RollingBloom-refresh,1,0.000100000000000,0.000100000000000,0.000100000000000
RollingBloom-refresh,1,0.000100000000000,0.000100000000000,0.000100000000000
RollingBloom-refresh,1,0.000117000000000,0.000117000000000,0.000117000000000
RollingBloom-refresh,1,0.000101000000000,0.000101000000000,0.000101000000000
RollingBloom-refresh,1,0.000111000000000,0.000111000000000,0.000111000000000
RollingBloom,1310720,0.000000795478627,0.000000927659130,0.000000840967550,2063,2406,2181
SHA1,512,0.001935496926308,0.002218931913376,0.002032823860645,5020685,5755989,5273419
SHA256,208,0.004498481750488,0.005540251731873,0.004966990305827,11667956,14371842,12885041
SHA256_32b,4,0.345051527023315,0.346106529235840,0.345579028129578,895062320,897798922,896430621
SHA512,352,0.002845406532288,0.003299534320831,0.003069994124499,7380971,8558912,7963958
SipHash_32b,30,0.033124923706055,0.037207484245300,0.035290129979451,85925534,96516794,91547269
Sleep100ms,10,0.100992441177368,0.104498505592346,0.102697491645813,261974287,271068521,266396862
Trig,67108864,0.000000014460568,0.000000015428895,0.000000014972940,37,40,38
VerifyScriptBench,5632,0.000182222574949,0.000207984820008,0.000195238654586,472678,539492,506447
@laanwj I was playing around with this type of timing earlier and read that I should be wary of rdtsc getting reordered with respect to other instructions and that if you can't use rdtscp instead, then you should add a serializing instruction first like cpuid. Also do you not have any issues with the thread migrating to another core? I had to set cpu affinity. I couldn't find where I was reading all that, but here is one link: |
Yes, both the compiler and the CPU pipeline may reorder it. In this specific case it's not too bad, though, because the call is already from a function (State::KeepRunning) called inside the benchmark. So there is quite some overhead already, making reordering by a few instructions probably unnoticeable in the noise.
I didn't know that. Although rdtscp seems not to be available on all x86 processors. I'll leave that as a future improvement. x86 is already precise and low-overhead compared to the ARM path which has to do a syscall (the instructions aren't available to user-space).
Indeed, calling bench with e.g. |
Running on OSX (3.4GHz i7)
|
@fanquake: did you compile with |
It looks like |
Yes those lines are "faked" they don't go through the framework. so cycles
is missing there, it will only be shown for the total. not a big deal,
though there will need to be a proper solution for nested benchmarks at
some point that doesn't involve printing from inside the benchmarked code.
Not in this pull though.
|
Agreed. ACK 3532818 |
--enable-debug
|
3532818 bench: Add support for measuring CPU cycles (Wladimir J. van der Laan)
3532818 bench: Add support for measuring CPU cycles (Wladimir J. van der Laan)
3532818 bench: Add support for measuring CPU cycles (Wladimir J. van der Laan)
3532818 bench: Add support for measuring CPU cycles (Wladimir J. van der Laan)
Micro-benchmarking framework part 1 Cherry-picked from the following upstream PRs: - bitcoin/bitcoin#6733 - bitcoin/bitcoin#6770 - bitcoin/bitcoin#6892 - Excluding changes to `src/policy/policy.h` which we don't have yet. - bitcoin/bitcoin#7934 - Just the benchmark, not the performance improvements. - bitcoin/bitcoin#8039 - bitcoin/bitcoin#8107 - bitcoin/bitcoin#8115 - bitcoin/bitcoin#8914 - Required resolving several merge conflicts in code that had been refactored upstream. The changes were simple enough that I decided it was okay to impose merge conflicts on pulling in those refactors later. - bitcoin/bitcoin#9200 - bitcoin/bitcoin#9202 - Adds support for measuring CPU cycles, which is later removed in an upstream PR after the refactor. I am including it to reduce future merge conflicts. - bitcoin/bitcoin#9281 - Only changes to `src/bench/bench.cpp` - bitcoin/bitcoin#9498 - bitcoin/bitcoin#9712 - bitcoin/bitcoin#9547 - bitcoin/bitcoin#9505 - Just the benchmark, not the performance improvements. - bitcoin/bitcoin#9792 - Just the benchmark, not the performance improvements. - bitcoin/bitcoin#10272 - bitcoin/bitcoin#10395 - Only changes to `src/bench/` - bitcoin/bitcoin#10735 - Only changes to `src/bench/base58.cpp` - bitcoin/bitcoin#10963 - bitcoin/bitcoin#11303 - Only the benchmark backend change. - bitcoin/bitcoin#11562 - bitcoin/bitcoin#11646 - bitcoin/bitcoin#11654 This pulls in all changes to the micro-benchmark framework prior to December 2017, when it was rewritten. The rewrite depends on other upstream PRs we have not pulled in yet. This does not pull in all benchmarks prior to December 2017. It leaves out benchmarks that either test code we do not have yet (except for the `FastRandomContext` refactor, which I decided to pull in), or would require rewrites to work with our changes to the codebase.
3f3edde [Bench] Use PIVX address in Base58Decode test (random-zebra) 5a1be90 [Travis] Disable benchmark framework for trusty test (random-zebra) 1bd89ac Initialize recently introduced non-static class member lastCycles to zero in constructor (random-zebra) ec60671 Require a steady clock for bench with at least micro precision (random-zebra) 84069ce bench: prefer a steady clock if the resolution is no worse (random-zebra) 38367b1 bench: switch to std::chrono for time measurements (random-zebra) a24633a Remove countMaskInv caching in bench framework (random-zebra) 9e9bc22 Restore default format state of cout after printing with std::fixed/setprecision (random-zebra) 3dd559d Avoid static analyzer warnings regarding uninitialized arguments (random-zebra) e85f224 Replace boost::function with std::function (C++11) (random-zebra) 98c0857 Prevent warning: variable 'x' is uninitialized (random-zebra) 7f0d4b3 FastRandom benchmark (random-zebra) d9fa0c6 Add prevector destructor benchmark (random-zebra) e1527ba Assert that what might look like a possible division by zero is actually unreachable (random-zebra) e94cf15 bench: Fix initialization order in registration (random-zebra) 151c25f Basic CCheckQueue Benchmarks (random-zebra) 51aedbc Use std:thread:hardware_concurrency, instead of Boost, to determine available cores (random-zebra) d447613 Use real number of cores for default -par, ignore virtual cores (random-zebra) 9162a56 [Refactoring] Removed using namespace <xxx> from bench/ sources (random-zebra) 5c07f67 bench: Add support for measuring CPU cycles (random-zebra) 41ce1ed bench: Fix subtle counting issue when rescaling iteration count (random-zebra) 68ea794 Avoid integer division in the benchmark inner-most loop. (random-zebra) 3fa4f27 bench: Added base58 encoding/decoding benchmarks (random-zebra) 4442118 bench: Add crypto hash benchmarks (random-zebra) a5179b6 [Trivial] ensure minimal header conventions (random-zebra) 8607d6b Support very-fast-running benchmarks (random-zebra) 4aebb60 Simple benchmarking framework (random-zebra) Pull request description: Introduces the benchmarking framework, loosely based on google's micro-benchmarking library (https://github.com/google/benchmark), ported from Bitcoin, up to 0.16. The benchmark framework is hard-coded to run each benchmark for one wall-clock second, and then spits out .csv-format timing information to stdout. Backported PR: - bitcoin#6733 - bitcoin#6770 - bitcoin#6892 - bitcoin#8039 - bitcoin#8107 - bitcoin#8115 - bitcoin#9200 - bitcoin#9202 - bitcoin#9281 - bitcoin#6361 - bitcoin#10271 - bitcoin#9498 - bitcoin#9712 - bitcoin#9547 - bitcoin#9505 (benchmark only. Rest was in #1557) - bitcoin#9792 (benchmark only. Rest was in #643) - bitcoin#10272 - bitcoin#10395 (base58 only) - bitcoin#10963 - bitcoin#11303 (first commit) - bitcoin#11562 - bitcoin#11646 - bitcoin#11654 Current output of `src/bench/bench_pivx`: ``` #Benchmark,count,min(ns),max(ns),average(ns),min_cycles,max_cycles,average_cycles Base58CheckEncode,131072,7697,8065,7785,20015,20971,20242 Base58Decode,294912,3305,3537,3454,8595,9198,8981 Base58Encode,180224,5498,6020,5767,14297,15652,14994 CCheckQueueSpeed,320,3159960,3535173,3352787,8216030,9191602,8717388 CCheckQueueSpeedPrevectorJob,96,9184484,11410840,10823070,23880046,29668680,28140445 FastRandom_1bit,320,3143690,4838162,3199156,8173726,12579373,8317941 FastRandom_32bit,60,17097612,17923669,17367440,44454504,46602306,45156079 PrevectorClear,3072,334741,366618,346731,870340,953224,901516 PrevectorDestructor,2816,344233,368912,357281,895022,959187,928948 RIPEMD160,288,3404503,3693917,3577774,8851850,9604334,9302363 SHA1,384,2718128,2891558,2802513,7067238,7518184,7286652 SHA256,176,6133760,6580005,6239866,15948035,17108376,16223916 SHA512,240,4251468,4358706,4313463,11054006,11332826,11215186 Sleep100ms,10,100221470,100302411,100239073,260580075,260790726,260625870 ``` NOTE: Not all the tests have been pulled yet (as we might not have the code being tested, or it would require rewrites to work with our different code base), but the framework is updated to December 2017. ACKs for top commit: Fuzzbawls: ACK 3f3edde Tree-SHA512: c283311a9accf6d2feeb93b185afa08589ebef3f18b6e86980dbc3647b9845f75ac9ecce2f1b08738d25ceac36596a2c89d41e4dbf3b463502aa695611aa1f8e
This adds cycle min/max/avg to the statistics.
Supported on x86 and x86_64 (natively through rdtsc), as well as for some other architectures on Linux (perf syscall). Will just show 0 on unsupported platforms.
Was tested on x86_64 and AARCH64.