Skip to content

Conversation

laanwj
Copy link
Member

@laanwj laanwj commented Nov 22, 2016

This adds cycle min/max/avg to the statistics.

Supported on x86 and x86_64 (natively through rdtsc), as well as for some other architectures on Linux (perf syscall). Will just show 0 on unsupported platforms.

Was tested on x86_64 and AARCH64.

This adds cycle min/max/avg to the statistics.

Supported on x86 and x86_64 (natively through rdtsc), as well as Linux
(perf syscall).
@laanwj laanwj added the Tests label Nov 22, 2016
Copy link
Contributor

@jonasschnelli jonasschnelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested ACK (OSX) 3532818

Result with -02 on OSX (2.6 GHz Intel Core i7)

#Benchmark,count,min,max,average,min_cycles,max_cycles,average_cycles
Base58CheckEncode,229376,0.000003975452273,0.000005225097993,0.000004511387877,10312,13553,11703
Base58Decode,851968,0.000001059088390,0.000001410629920,0.000001215919978,2747,3659,3154
Base58Encode,327680,0.000002935426892,0.000003485380148,0.000003217919584,7617,9042,8347
CCoinsCaching,90112,0.000009148381650,0.000012961449102,0.000011695591225,23730,33622,30338
CoinSelection,416,0.002168059349060,0.002760812640190,0.002422756873644,5623936,7161456,6284967
DeserializeAndCheckBlockTest,72,0.013411879539490,0.015962481498718,0.014648040135701,34790547,41406927,37996977
DeserializeBlockTest,88,0.010604500770569,0.012940049171448,0.011376557025042,27508165,33566117,29512372
LockedPool,512,0.001808419823647,0.003033317625523,0.002045841421932,4691069,7868411,5307190
MempoolEviction,15360,0.000059797894210,0.000087032094598,0.000065140891820,155116,225747,168975
RIPEMD160,384,0.002612933516502,0.002894565463066,0.002725711092353,6777939,7508452,7070852
RollingBloom-refresh,1,0.000611000000000,0.000611000000000,0.000611000000000
RollingBloom-refresh,1,0.000105000000000,0.000105000000000,0.000105000000000
RollingBloom-refresh,1,0.000101000000000,0.000101000000000,0.000101000000000
RollingBloom-refresh,1,0.000097000000000,0.000097000000000,0.000097000000000
RollingBloom-refresh,1,0.000113000000000,0.000113000000000,0.000113000000000
RollingBloom-refresh,1,0.000096000000000,0.000096000000000,0.000096000000000
RollingBloom-refresh,1,0.000096000000000,0.000096000000000,0.000096000000000
RollingBloom-refresh,1,0.000099000000000,0.000099000000000,0.000099000000000
RollingBloom-refresh,1,0.000096000000000,0.000096000000000,0.000096000000000
RollingBloom-refresh,1,0.000108000000000,0.000108000000000,0.000108000000000
RollingBloom-refresh,1,0.000128000000000,0.000128000000000,0.000128000000000
RollingBloom-refresh,1,0.000094000000000,0.000094000000000,0.000094000000000
RollingBloom-refresh,1,0.000151000000000,0.000151000000000,0.000151000000000
RollingBloom-refresh,1,0.000095000000000,0.000095000000000,0.000095000000000
RollingBloom-refresh,1,0.000106000000000,0.000106000000000,0.000106000000000
RollingBloom-refresh,1,0.000124000000000,0.000124000000000,0.000124000000000
RollingBloom-refresh,1,0.000115000000000,0.000115000000000,0.000115000000000
RollingBloom-refresh,1,0.000100000000000,0.000100000000000,0.000100000000000
RollingBloom-refresh,1,0.000100000000000,0.000100000000000,0.000100000000000
RollingBloom-refresh,1,0.000117000000000,0.000117000000000,0.000117000000000
RollingBloom-refresh,1,0.000101000000000,0.000101000000000,0.000101000000000
RollingBloom-refresh,1,0.000111000000000,0.000111000000000,0.000111000000000
RollingBloom,1310720,0.000000795478627,0.000000927659130,0.000000840967550,2063,2406,2181
SHA1,512,0.001935496926308,0.002218931913376,0.002032823860645,5020685,5755989,5273419
SHA256,208,0.004498481750488,0.005540251731873,0.004966990305827,11667956,14371842,12885041
SHA256_32b,4,0.345051527023315,0.346106529235840,0.345579028129578,895062320,897798922,896430621
SHA512,352,0.002845406532288,0.003299534320831,0.003069994124499,7380971,8558912,7963958
SipHash_32b,30,0.033124923706055,0.037207484245300,0.035290129979451,85925534,96516794,91547269
Sleep100ms,10,0.100992441177368,0.104498505592346,0.102697491645813,261974287,271068521,266396862
Trig,67108864,0.000000014460568,0.000000015428895,0.000000014972940,37,40,38
VerifyScriptBench,5632,0.000182222574949,0.000207984820008,0.000195238654586,472678,539492,506447

@morcos
Copy link
Contributor

morcos commented Nov 22, 2016

@laanwj I was playing around with this type of timing earlier and read that I should be wary of rdtsc getting reordered with respect to other instructions and that if you can't use rdtscp instead, then you should add a serializing instruction first like cpuid. Also do you not have any issues with the thread migrating to another core? I had to set cpu affinity.

I couldn't find where I was reading all that, but here is one link:
http://blog.regehr.org/archives/330

@laanwj
Copy link
Member Author

laanwj commented Nov 23, 2016

I was playing around with this type of timing earlier and read that I should be wary of rdtsc getting reordered with respect to other instructions

Yes, both the compiler and the CPU pipeline may reorder it. In this specific case it's not too bad, though, because the call is already from a function (State::KeepRunning) called inside the benchmark. So there is quite some overhead already, making reordering by a few instructions probably unnoticeable in the noise.

and that if you can't use rdtscp instead, then you should add a serializing instruction first like cpuid.

I didn't know that. Although rdtscp seems not to be available on all x86 processors. I'll leave that as a future improvement.

x86 is already precise and low-overhead compared to the ARM path which has to do a syscall (the instructions aren't available to user-space).

Also do you not have any issues with the thread migrating to another core? I had to set cpu affinity.

Indeed, calling bench with e.g. taskset -c 0 bench_bitcoin will likely get more precise cycle measurements.

@fanquake
Copy link
Member

Running on OSX (3.4GHz i7)

#Benchmark,count,min,max,average,min_cycles,max_cycles,average_cycles
Base58CheckEncode,262144,0.000003828128683,0.000004008295946,0.000003908591680,12985,13597,13260
Base58Decode,983040,0.000000999269105,0.000001138963853,0.000001040822341,3389,3863,3530
Base58Encode,425984,0.000002385859261,0.000002805812983,0.000002487964454,8093,9517,8440
CCoinsCaching,106496,0.000009394483641,0.000010205199942,0.000010016403394,31871,34618,33978
CoinSelection,480,0.002068780362606,0.002542287111282,0.002170727153619,7017890,8624174,7364277
DeserializeAndCheckBlockTest,96,0.010975986719131,0.011605978012085,0.011256289978822,37233600,39370829,38184392
DeserializeBlockTest,112,0.009186625480652,0.010600864887238,0.009532500590597,31163698,35960873,32339308
LockedPool,640,0.001598000526428,0.001764506101608,0.001673145219684,5420783,5985758,5675764
MempoolEviction,14336,0.000070976559073,0.000092454254627,0.000074292616253,240772,313630,252040
RIPEMD160,416,0.002447441220284,0.002628095448017,0.002492115474664,8302338,8915072,8453936
RollingBloom-refresh,1,0.000569000000000,0.000569000000000,0.000569000000000
RollingBloom-refresh,1,0.000112000000000,0.000112000000000,0.000112000000000
RollingBloom-refresh,1,0.000121000000000,0.000121000000000,0.000121000000000
RollingBloom-refresh,1,0.000109000000000,0.000109000000000,0.000109000000000
RollingBloom-refresh,1,0.000109000000000,0.000109000000000,0.000109000000000
RollingBloom-refresh,1,0.000112000000000,0.000112000000000,0.000112000000000
RollingBloom-refresh,1,0.000114000000000,0.000114000000000,0.000114000000000
RollingBloom-refresh,1,0.000112000000000,0.000112000000000,0.000112000000000
RollingBloom-refresh,1,0.000114000000000,0.000114000000000,0.000114000000000
RollingBloom-refresh,1,0.000115000000000,0.000115000000000,0.000115000000000
RollingBloom-refresh,1,0.000125000000000,0.000125000000000,0.000125000000000
RollingBloom-refresh,1,0.000111000000000,0.000111000000000,0.000111000000000
RollingBloom-refresh,1,0.000108000000000,0.000108000000000,0.000108000000000
RollingBloom-refresh,1,0.000113000000000,0.000113000000000,0.000113000000000
RollingBloom-refresh,1,0.000116000000000,0.000116000000000,0.000116000000000
RollingBloom-refresh,1,0.000114000000000,0.000114000000000,0.000114000000000
RollingBloom-refresh,1,0.000109000000000,0.000109000000000,0.000109000000000
RollingBloom-refresh,1,0.000116000000000,0.000116000000000,0.000116000000000
RollingBloom-refresh,1,0.000149000000000,0.000149000000000,0.000149000000000
RollingBloom-refresh,1,0.000113000000000,0.000113000000000,0.000113000000000
RollingBloom-refresh,1,0.000114000000000,0.000114000000000,0.000114000000000
RollingBloom-refresh,1,0.000112000000000,0.000112000000000,0.000112000000000
RollingBloom-refresh,1,0.000111000000000,0.000111000000000,0.000111000000000
RollingBloom-refresh,1,0.000112000000000,0.000112000000000,0.000112000000000
RollingBloom,1441792,0.000000725554855,0.000000796730092,0.000000755561731,2465,2702,2563
SHA1,576,0.001806784421206,0.001858018338680,0.001830946240160,6129109,6302896,6211558
SHA256,240,0.004244878888130,0.004624485969543,0.004340062538783,14399626,15688224,14722663
SHA256_32b,4,0.300903081893921,0.302513957023621,0.301708519458771,1020885060,1026209988,1023547524
SHA512,384,0.002580255270004,0.002831816673279,0.002633192886909,8752964,9606202,8932496
SipHash_32b,28,0.036777973175049,0.037643551826477,0.037094610077994,124759771,127697052,125844822
Sleep100ms,10,0.102699518203735,0.104491472244263,0.103782296180725,348388267,354464246,352058326
Trig,67108864,0.000000015213971,0.000000016026718,0.000000015468853,51,54,52
VerifyScriptBench,6144,0.000170339830220,0.000214513391256,0.000176954781637,577836,727691,600278

@jonasschnelli
Copy link
Contributor

@fanquake: did you compile with -O2 or -O0 (--enable-debug)?

@paveljanik
Copy link
Contributor

paveljanik commented Nov 23, 2016

It looks like RollingBloom-refresh bench is not changed to the new output format.

@laanwj
Copy link
Member Author

laanwj commented Nov 23, 2016 via email

@paveljanik
Copy link
Contributor

Agreed.

ACK 3532818

@fanquake
Copy link
Member

@jonasschnelli

--enable-debug

Options used to compile and link:
  debug enabled = yes
  target os     = darwin
  build os      = darwin

  CC            = /usr/local/bin/ccache gcc
  CFLAGS        = -g -O2 -g3 -O0
  CPPFLAGS      = -Qunused-arguments  -DDEBUG -DDEBUG_LOCKORDER -DHAVE_BUILD_INFO -D__STDC_FORMAT_MACROS -I/usr/local/opt/berkeley-db4/include -DMAC_OSX
  CXX           = /usr/local/bin/ccache g++ -std=c++11
  CXXFLAGS      = -g -O2 -g3 -O0 -Wall -Wextra -Wformat -Wformat-security -Wno-unused-parameter -Wno-self-assign -Wno-unused-local-typedef -Wno-deprecated-register
  LDFLAGS       =  -Wl,-headerpad_max_install_names -Wl,-dead_strip
#Benchmark,count,min,max,average,min_cycles,max_cycles,average_cycles
Base58CheckEncode,30720,0.000034503871575,0.000035417033359,0.000034728871348,117174,120144,117818
Base58Decode,73728,0.000013721641153,0.000014259770978,0.000013993813708,46547,48372,47470
Base58Encode,40960,0.000023265369236,0.000028600101359,0.000024570309324,78931,97019,83356
CCoinsCaching,13312,0.000073989387602,0.000080669764429,0.000076363722865,250992,273651,259056
CoinSelection,104,0.009642988443375,0.011777520179749,0.009857095204867,32711658,40027011,33439357
DeserializeAndCheckBlockTest,12,0.087463498115540,0.089210510253906,0.088053584098816,296699541,302626002,298725543
DeserializeBlockTest,16,0.068791985511780,0.069242000579834,0.068972617387772,233362289,234886757,233973693
LockedPool,160,0.004223585128784,0.006983995437622,0.006671081483364,14328786,23691500,22631932
MempoolEviction,2560,0.000405104830861,0.000430928543210,0.000418359413743,1374232,1461820,1419187
RIPEMD160,20,0.053014993667603,0.053706049919128,0.053378355503082,179840289,182186794,181088217
RollingBloom,229376,0.000004121757229,0.000005482856068,0.000004432338756,13982,18599,15035
SHA1,56,0.018210709095001,0.019991517066956,0.018770660672869,61774970,67816841,63680293
SHA256,32,0.032343983650208,0.033765912055969,0.033258154988289,109720940,114543363,112829591
SHA256_32b,2,2.213187932968140,2.213187932968140,2.213187932968140,7508019029,7508019029,7508019029
SHA512,52,0.020092487335205,0.020573496818542,0.020253401536208,68158867,69791050,68705021
SipHash_32b,8,0.155891418457031,0.157407522201538,0.156427383422852,528826462,534114465,530680206
Sleep100ms,10,0.100543022155762,0.104717016220093,0.102999615669250,341066705,355370400,349431503
Trig,62914560,0.000000015808098,0.000000016690024,0.000000016066789,53,56,54
VerifyScriptBench,3584,0.000285433605313,0.000293343327940,0.000287477991411,968261,996237,975283

@laanwj laanwj merged commit 3532818 into bitcoin:master Nov 29, 2016
laanwj added a commit that referenced this pull request Nov 29, 2016
3532818 bench: Add support for measuring CPU cycles (Wladimir J. van der Laan)
codablock pushed a commit to codablock/dash that referenced this pull request Jan 17, 2018
3532818 bench: Add support for measuring CPU cycles (Wladimir J. van der Laan)
andvgal pushed a commit to energicryptocurrency/gen2-energi that referenced this pull request Jan 6, 2019
3532818 bench: Add support for measuring CPU cycles (Wladimir J. van der Laan)
CryptoCentric pushed a commit to absolute-community/absolute that referenced this pull request Feb 25, 2019
3532818 bench: Add support for measuring CPU cycles (Wladimir J. van der Laan)
zkbot added a commit to zcash/zcash that referenced this pull request Jan 24, 2020
Micro-benchmarking framework part 1

Cherry-picked from the following upstream PRs:

- bitcoin/bitcoin#6733
- bitcoin/bitcoin#6770
- bitcoin/bitcoin#6892
  - Excluding changes to `src/policy/policy.h` which we don't have yet.
- bitcoin/bitcoin#7934
  - Just the benchmark, not the performance improvements.
- bitcoin/bitcoin#8039
- bitcoin/bitcoin#8107
- bitcoin/bitcoin#8115
- bitcoin/bitcoin#8914
  - Required resolving several merge conflicts in code that had been refactored upstream. The changes were simple enough that I decided it was okay to impose merge conflicts on pulling in those refactors later.
- bitcoin/bitcoin#9200
- bitcoin/bitcoin#9202
  - Adds support for measuring CPU cycles, which is later removed in an upstream PR after the refactor. I am including it to reduce future merge conflicts.
- bitcoin/bitcoin#9281
  - Only changes to `src/bench/bench.cpp`
- bitcoin/bitcoin#9498
- bitcoin/bitcoin#9712
- bitcoin/bitcoin#9547
- bitcoin/bitcoin#9505
  - Just the benchmark, not the performance improvements.
- bitcoin/bitcoin#9792
  - Just the benchmark, not the performance improvements.
- bitcoin/bitcoin#10272
- bitcoin/bitcoin#10395
  - Only changes to `src/bench/`
- bitcoin/bitcoin#10735
  - Only changes to `src/bench/base58.cpp`
- bitcoin/bitcoin#10963
- bitcoin/bitcoin#11303
  - Only the benchmark backend change.
- bitcoin/bitcoin#11562
- bitcoin/bitcoin#11646
- bitcoin/bitcoin#11654

This pulls in all changes to the micro-benchmark framework prior to December 2017, when it was rewritten. The rewrite depends on other upstream PRs we have not pulled in yet.

This does not pull in all benchmarks prior to December 2017. It leaves out benchmarks that either test code we do not have yet (except for the `FastRandomContext` refactor, which I decided to pull in), or would require rewrites to work with our changes to the codebase.
furszy added a commit to PIVX-Project/PIVX that referenced this pull request Jun 8, 2020
3f3edde [Bench] Use PIVX address in Base58Decode test (random-zebra)
5a1be90 [Travis] Disable benchmark framework for trusty test (random-zebra)
1bd89ac Initialize recently introduced non-static class member lastCycles to zero in constructor (random-zebra)
ec60671 Require a steady clock for bench with at least micro precision (random-zebra)
84069ce bench: prefer a steady clock if the resolution is no worse (random-zebra)
38367b1 bench: switch to std::chrono for time measurements (random-zebra)
a24633a Remove countMaskInv caching in bench framework (random-zebra)
9e9bc22 Restore default format state of cout after printing with std::fixed/setprecision (random-zebra)
3dd559d Avoid static analyzer warnings regarding uninitialized arguments (random-zebra)
e85f224 Replace boost::function with std::function (C++11) (random-zebra)
98c0857 Prevent warning: variable 'x' is uninitialized (random-zebra)
7f0d4b3 FastRandom benchmark (random-zebra)
d9fa0c6 Add prevector destructor benchmark (random-zebra)
e1527ba Assert that what might look like a possible division by zero is actually unreachable (random-zebra)
e94cf15 bench: Fix initialization order in registration (random-zebra)
151c25f Basic CCheckQueue Benchmarks (random-zebra)
51aedbc Use std:thread:hardware_concurrency, instead of Boost, to determine available cores (random-zebra)
d447613 Use real number of cores for default -par, ignore virtual cores (random-zebra)
9162a56 [Refactoring] Removed using namespace <xxx> from bench/ sources (random-zebra)
5c07f67 bench: Add support for measuring CPU cycles (random-zebra)
41ce1ed bench: Fix subtle counting issue when rescaling iteration count (random-zebra)
68ea794 Avoid integer division in the benchmark inner-most loop. (random-zebra)
3fa4f27 bench: Added base58 encoding/decoding benchmarks (random-zebra)
4442118 bench: Add crypto hash benchmarks (random-zebra)
a5179b6 [Trivial] ensure minimal header conventions (random-zebra)
8607d6b Support very-fast-running benchmarks (random-zebra)
4aebb60 Simple benchmarking framework (random-zebra)

Pull request description:

  Introduces the benchmarking framework, loosely based on google's micro-benchmarking library (https://github.com/google/benchmark), ported from Bitcoin, up to 0.16.
  The benchmark framework is hard-coded to run each benchmark for one wall-clock second,
  and then spits out .csv-format timing information to stdout.

  Backported PR:
  - bitcoin#6733
  - bitcoin#6770
  - bitcoin#6892
  - bitcoin#8039
  - bitcoin#8107
  - bitcoin#8115
  - bitcoin#9200
  - bitcoin#9202
  - bitcoin#9281
  - bitcoin#6361
  - bitcoin#10271
  - bitcoin#9498
  - bitcoin#9712
  - bitcoin#9547
  - bitcoin#9505 (benchmark only. Rest was in #1557)
  - bitcoin#9792 (benchmark only. Rest was in #643)
  - bitcoin#10272
  - bitcoin#10395 (base58 only)
  - bitcoin#10963
  - bitcoin#11303 (first commit)
  - bitcoin#11562
  - bitcoin#11646
  - bitcoin#11654

  Current output of `src/bench/bench_pivx`:
  ```
  #Benchmark,count,min(ns),max(ns),average(ns),min_cycles,max_cycles,average_cycles
  Base58CheckEncode,131072,7697,8065,7785,20015,20971,20242
  Base58Decode,294912,3305,3537,3454,8595,9198,8981
  Base58Encode,180224,5498,6020,5767,14297,15652,14994
  CCheckQueueSpeed,320,3159960,3535173,3352787,8216030,9191602,8717388
  CCheckQueueSpeedPrevectorJob,96,9184484,11410840,10823070,23880046,29668680,28140445
  FastRandom_1bit,320,3143690,4838162,3199156,8173726,12579373,8317941
  FastRandom_32bit,60,17097612,17923669,17367440,44454504,46602306,45156079
  PrevectorClear,3072,334741,366618,346731,870340,953224,901516
  PrevectorDestructor,2816,344233,368912,357281,895022,959187,928948
  RIPEMD160,288,3404503,3693917,3577774,8851850,9604334,9302363
  SHA1,384,2718128,2891558,2802513,7067238,7518184,7286652
  SHA256,176,6133760,6580005,6239866,15948035,17108376,16223916
  SHA512,240,4251468,4358706,4313463,11054006,11332826,11215186
  Sleep100ms,10,100221470,100302411,100239073,260580075,260790726,260625870
  ```

  NOTE: Not all the tests have been pulled yet (as we might not have the code being tested, or it  would require rewrites to work with our different code base), but the framework is updated to December 2017.

ACKs for top commit:
  Fuzzbawls:
    ACK 3f3edde

Tree-SHA512: c283311a9accf6d2feeb93b185afa08589ebef3f18b6e86980dbc3647b9845f75ac9ecce2f1b08738d25ceac36596a2c89d41e4dbf3b463502aa695611aa1f8e
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants