Skip to content

Conversation

martinus
Copy link
Contributor

The benchmark's KeepRunning() used to make a function call for each call, inflating measurement times for short running code. This change inlines the critical code that is executed each run and moves the slow timer updates into a new function.

This change increases the average runtime for Trig from 0.000000082339208 sec to 0.000000080948591.

@TheBlueMatt
Copy link
Contributor

Cool! The benchmark suite is really not at all very good, sadly. If you feel up to it, I think we'd do better by just hardcoding the number of runs in each benchmark - it is often very easy to see very significant differences in the average runtime based on number of runs (even for things which take 1+ms per run!). Unless you have some better intuition than I about why this is the case, that is.

@martinus
Copy link
Contributor Author

A while ago I wrote a microbenchmarking library similar to google benchmark. There I've used the geometric mean I think is a much better statistic than average here, and its better when you have outliers. I also think having the mean and confidence interval is much more reasonable than just min, max,avg values. Would you be interested in updates like this?

@TheBlueMatt
Copy link
Contributor

I'd be happy with anything that gives more reliable values, really. Right now, to compre benchmark runs, I often have to pin to a CPU, then run multiple times, and only consider differences if both runs had the same iteration count, which is a huge pain.

@fanquake fanquake added the Tests label Oct 17, 2017
@laanwj
Copy link
Member

laanwj commented Oct 18, 2017

Cool! The benchmark suite is really not at all very good, sadly. If you feel up to it, I think we'd do better by just hardcoding the number of runs in each benchmark

The idea of doing it this way is that the benchmarks can be run in roughly the same time on slower hardware, whereas on faster hardware they are more precise. The secp256k1 benchmarks have the problem that they take basically forever on lower-end CPUs. I agree it's probably broken though, and the range of CPUs that bitcoind can realistically run on is much smaller than secp256k1's.
(if this is an issue, maybe a fixed scaling factor could be passed on the command line for this purpose, that scales all the benchmark iterations equally and is at least deterministic?)

@TheBlueMatt
Copy link
Contributor

Yea, a scaling factor seems fine, even an auto-calculated one would probably also be fine, as long as it is very stable/deterministic on the same hardware between runs, which the current version is not.

@martinus
Copy link
Contributor Author

I don't think using the exact same number of iterations is a huge benefit for stability. The problem is there will always be some random events in the system that cause some delays (unwanted fluctuations). The other problem are fluctuations that are inherent to what's benchmarked: e.g. if the code does any allocations, the runtime will always fluctuate. This fluctuation won't go away with fixed number of runs and I think the best way to deal with it is with better statistics. In my old library I am doing this:

  1. Perform 5 runs, each approximately 0.5 seconds.
  2. Calculate geometric mean of the runs.
  3. calculate the standard deviation of the runtime's logarithm, and print the 95% confidence interval of the runtime.

Using the logarithm is much more appropriate for benchmark results, because runtime simply can't be negative. With that data we can use a student's t test to compare the mean of two different algorithms to show if the difference is actually statistically significant or might be just random fluctuation.

That's a bit more complicated, but not by much; boost has all the features for calculating these statistics in a few lines of code. I can work on that but unfortunately I'm quite busy so that will take some time

@TheBlueMatt
Copy link
Contributor

For my largest use of bench (FIBRE stuff, which has pretty wide-ranging bench's which run through FEC encode/decode + mempool scan + block decode) number of runs makes a huge difference. I'd be happy to be wrong, but my guess has always been cache-related/branch-prediction effects (the issues still appear when you give bench_bitcoin a dedicated core, minimizing any system disruption). Fluctuation is fine, but I often see fluctuations which are 2-4% when iteration count changes (on roughly-1ms-runtime!) when trying to tease out 5% changes. For something running that long, it seems strange to me that you'd see such huge fluctuations.

@martinus
Copy link
Contributor Author

That's strange. Using a fixed number of iterations should be a simple change. How about adding a command line option to specify a fixed number of iterations? Also a filter would be useful so only a specified test runs.

@thebostik
Copy link

The premise is a bit flawed, (though this version might still be better, I like the idea of visualizing the data),

used to make a function call for each call

isn't true because the value of countMask changes in two different ways inside of this logic, increasing to prevent time and cycles from being collected on every other iteration (how it starts at first). That countMask jumps up pretty quickly to powers of 2 - 1 (so the loop condition exits unless the count is a pretty high power of 2 - 1).

There's a comment that says as much, but it's non-obvious since it's tucked pretty far inside that logic.
// If the execution was much too fast (1/128th of maxElapsed), increase the count mask by 8x and restart timing.
// The restart avoids including the overhead of this code in the measurement.

I added another counter inside of KeepRunning to track those function invocations. It looks like when the benchmark has scaled up for the final timing loop for each, there are no more than 20 of these timing calls in the final output of the benchmark for the fastest benches (example of this, new column overheadCount, and new row TimeLookup). The timing calls themselves take 25ns avg each according to the same benchmark code (heh).

So I guess the complexity / readability to achieve resource-independent 1s bench runs is the main thing that could be improve with this current version, I thought the same thing too until really digging into it and playing with countMask some.

@martinus
Copy link
Contributor Author

isn't true because the value of countMask changes in two different ways inside of this logic, increasing to prevent time and cycles from being collected on every other iteration (how it starts at first).

That's right, but KeepRunning() itself used to introduce a function call overhead. With my change the fast path is inlined, and the slow measuring path is in the non-inlined method.

I've benchmarked an extreme example (with cout so compiler doesn't optimize it away):

double d = 0.01;
while (state.KeepRunning()) {
    d += 0.000001;
}
std::cout << d << std::endl;

Before my change that the measurement is 3.19 ns, with the inlined change I measure 0.83 ns.

I also find the plot extremely useful, and the command line options. I can now do this:

./bench/bench_bitcoin -filter=Deser.* -evals=20 -printer=plot |tee out.html

The output is a nice plot, I've uploaded it here: http://lawyer-maggie-58725.bitballoon.com/
There you can clearly see that there is an outlier for DeserializeAndCheckBlockTest. Using the median of multiple runs it is nicely filtered out.

@martinus martinus changed the title Tests: Improve benchmark precision by inlining performance critical code Tests: Improve benchmark precision Oct 27, 2017
Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK f028ada09ee92a1950ccbc60415301b146eb349c if code comments/whitespace/variable names are cleaned up. This change adds some features and makes the code more portable and effectively makes benchmarks run with a fixed number of iterations like Matt wanted, though I think it would be a little more straightforward if the BENCHMARK macro just took the number of KeepRunning iterations literally instead of taking an average time estimate, and using 0.5/average_time as the number of iterations.


static void RunAll(double elapsedTimeForOne=1.0);
static void RunAll(Printer& printer, uint64_t numEvals = 1, double elapsedTimeForOneEval = 0.5, const std::string& filter = ".*", bool isListOnly = false);
};
}

// BENCHMARK(foo) expands to: benchmark::BenchRunner bench_11foo("foo", foo);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is now out of date.

};
}

// BENCHMARK(foo) expands to: benchmark::BenchRunner bench_11foo("foo", foo);
#define BENCHMARK(n) \
benchmark::BenchRunner BOOST_PP_CAT(bench_, BOOST_PP_CAT(__LINE__, n))(BOOST_PP_STRINGIZE(n), n);
#define BENCHMARK(n, avgTime) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usage example at the top of bench.h is now out of date. Should also add comment there explaining the new macro argument.

#define BENCHMARK(n) \
benchmark::BenchRunner BOOST_PP_CAT(bench_, BOOST_PP_CAT(__LINE__, n))(BOOST_PP_STRINGIZE(n), n);
#define BENCHMARK(n, avgTime) \
benchmark::BenchRunner BOOST_PP_CAT(bench_, BOOST_PP_CAT(__LINE__, n))(BOOST_PP_STRINGIZE(n), n, avgTime);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest expected_time or estimated_duration instead of avgTime. Developer guide recommends using snake case for variables, and avg doesn't tell you what it's supposed to be the average of.

SHA256AutoDetect();
RandomInit();
ECC_Start();
SetupEnvironment();
fPrintToDebugLog = false; // don't want to write to debug.log file

benchmark::BenchRunner::RunAll();
int64_t evaluations = gArgs.GetArg("-evals", 5);
std::string regexFilter = gArgs.GetArg("-filter", ".*");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add -help output describing what the options do.

Should use snake_case for variable names (and ideally match them with option names).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the review! I'll see if I can fix all that over the weekend.


class State {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New code is missing indentation, so now it is inconsistent, and BenchRunner looks like it is nested inside State. Also, the State class is key for someone trying to understand the benchmark framework, so would recommend leaving on top of the printer classes (you can forward declare Printer).


static void RunAll(double elapsedTimeForOne=1.0);
static void RunAll(Printer& printer, uint64_t numEvals = 1, double elapsedTimeForOneEval = 0.5, const std::string& filter = ".*", bool isListOnly = false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default argument values never actually seem to be used. Would drop so they aren't repeated and don't become inconsistent in the future.

Copy link
Contributor

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome, thanks for tackling this!

<< HelpMessageOpt("-list", _("List benchmarks without executing them. Can be combined with -scaling and -filter"))
<< HelpMessageOpt("-evals=<n>", strprintf(_("Number of measurement evaluations to perform. (default: %u)"), DEFAULT_BENCH_EVALUATIONS))
<< HelpMessageOpt("-filter=<regex>", strprintf(_("Regular expression filter to select benchmark by name (default: %s)"), DEFAULT_BENCH_FILTER))
<< HelpMessageOpt("-scaling=<n>", strprintf(_("Scaling factor for iterations. The specified factor is then divided by 1000, so e.g. 500 results in a factor 0.5 (default: %u)"), DEFAULT_BENCH_SCALING))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somewhat confusing. Maybe call it the scaling factor for runtime to make it clear what it actually means?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the description now too

{
benchmarks().insert(std::make_pair(name, func));
std::cout << "<html><head>"
<< "<script src=\"https://cdn.plot.ly/plotly-latest.min.js\"></script>"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grr, I kinda feel yucky having code that generates a tracking request for people who want to see the output. Is it possible to check plotly in (does it have a compatible license?) or can we just put the graphs in simple cli form and skip the html output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plotly.js is released under the MIT license: https://github.com/plotly/plotly.js/blob/master/LICENSE, so that should be pretty compatible :)
The HTML output is optional, and default output is commandline, so I did not see a problem with that.
I've chosen it because the box plots are really nice, especially when using more evaluations this the output make it very visible how the runtime actually fluctuates. That makes comparisons much easier. It would be very hard to reproduce that in the command line.

return benchmarks_map;
void benchmark::ConsolePrinter::header()
{
std::cout << "# Benchmark, evals, iterations, total, min, max, median" << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong for is_list_only - maybe is_list_only mode should also call result() with some other parameter for the mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in the following commit


class BenchRunner
bool UpdateTimer(std::chrono::time_point<std::chrono::high_resolution_clock> finish_time);
void PrintResults() const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is neither defined, nor used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I've removed it

@theuni
Copy link
Member

theuni commented Nov 7, 2017

Concept ACK. The plotly.js issue could be worked around by requiring that a url be supplied when plotting.

It'd be nice if this could be rebased on #11562, as I think that's a simple logical subset of the changes here.

@laanwj
Copy link
Member

laanwj commented Nov 8, 2017

Concept ACK. Adding HTML as an optional format is neat, please keep CSV support though I need it (but you do so that's great).

NACK on checking plotly (or any JS code) into the repository.

I think it's ok to use this URI by default for the HTML output, but agree that it should be possible to provide your own (e.g. if you've installed it somewhere on your local network, and want to avoid the internet access).

Also agree on rebasing on top of #11562, that one is ready for merge and will go in today.

@martinus martinus force-pushed the improved-benchmarking branch from 2d8c862 to 84a2087 Compare November 8, 2017 20:57
@martinus
Copy link
Contributor Author

martinus commented Nov 9, 2017

I've rebased and squashed my changes on top of the current master, and added configurable plotly.js URL. That was the first time I've done a rebase, I hope I've done it correctly! The travis build is green but for some reason it shows as "build is in progress"?

@maflcko
Copy link
Member

maflcko commented Nov 9, 2017

Don't worry about the travis-yellow. This happens from time to time.

@maflcko
Copy link
Member

maflcko commented Nov 10, 2017

Needs rebase

@martinus martinus force-pushed the improved-benchmarking branch from 92a92d4 to ff8ed6e Compare November 10, 2017 23:24
Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 6c31a3599a66a6a59b23919f6e3724b80e936485. Would you mind updating "RollingBloom-refresh" to the same Interface/output as well?

Also, you removed the cycles statistics from the output. Would be nice to shortly mention the reason for that.

@@ -8,98 +8,135 @@
#include <assert.h>
#include <iostream>
#include <iomanip>
#include <algorithm>
#include <regex>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to #include <numeric> for std::accumulate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, updated!

@martinus martinus force-pushed the improved-benchmarking branch from 327fb5b to 9b4567c Compare November 18, 2017 09:38
@martinus
Copy link
Contributor Author

I have removed the cycles statistics because I personally don't think it is necessary, and it simplifies the code. I could add it back though if others think its a helpful statistic

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

void
benchmark::BenchRunner::RunAll(benchmark::duration elapsedTimeForOne)
void benchmark::BenchRunner::RunAll(Printer& printer, uint64_t num_evals, double scaling, const std::string& filter, bool is_list_only)
{
perf_init();
if (std::ratio_less_equal<benchmark::clock::period, std::micro>::value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (unrelated to your changes): Condition should be negated.

Mind to add a commit for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for noting. I have fixed that message and rebased & squashed everything

@martinus martinus force-pushed the improved-benchmarking branch from 6367d61 to 846ae3e Compare December 2, 2017 10:02
@maflcko
Copy link
Member

maflcko commented Dec 13, 2017

utACK 846ae3e5143a3959a03fe9bdff78e0253ef38a37

@TheBlueMatt
Copy link
Contributor

TheBlueMatt commented Dec 14, 2017

Code-level utACK 846ae3e5143a3959a03fe9bdff78e0253ef38a37, but there are a few iteration counts that seem much too high (or low). My output is below (take note of CCheckQueueSpeed which took much, much too long, as well as CCheckQueueSpeedPrevectorJob, as well as a few which were much too short.

Benchmark, evals, iterations, total, min, max, median

Base58CheckEncode, 5, 320000, 4.95443, 3.03136e-06, 3.15128e-06, 3.10137e-06
Base58Decode, 5, 800000, 4.70946, 1.15549e-06, 1.19147e-06, 1.17997e-06
Base58Encode, 5, 470000, 4.69162, 1.9516e-06, 2.03356e-06, 2.00767e-06
BenchLockedPool, 5, 530, 4.77861, 0.00174317, 0.00186048, 0.00180833
CCheckQueueSpeed, 5, 29000, 169.862, 0.00116512, 0.00117693, 0.00117196
CCheckQueueSpeedPrevectorJob, 5, 1400, 24.7343, 0.00344175, 0.00358993, 0.00354375
CCoinsCaching, 5, 170000, 5.63809, 6.55766e-06, 6.74362e-06, 6.64618e-06
CoinSelection, 5, 650, 7.56354, 0.00229932, 0.00235094, 0.00234165
DeserializeAndCheckBlockTest, 5, 160, 6.90174, 0.00853605, 0.00879497, 0.00859933
DeserializeBlockTest, 5, 130, 4.66008, 0.00716357, 0.00717861, 0.00716775
FastRandom_1bit, 5, 44000000, 0.552348, 2.49383e-09, 2.5533e-09, 2.50127e-09
FastRandom_32bit, 5, 110000000, 5.47546, 9.9329e-09, 1.00259e-08, 9.93521e-09
MempoolEviction, 5, 41000, 4.77384, 2.32089e-05, 2.34167e-05, 2.3244e-05
PrevectorClear, 5, 5600, 5.79097, 0.000206768, 0.000206858, 0.000206819
PrevectorDestructor, 5, 5700, 5.12332, 0.00017952, 0.00018031, 0.000179633
RIPEMD160, 5, 440, 5.56629, 0.00252712, 0.00254049, 0.00252763
RollingBloom, 5, 1500000, 6.21419, 8.1511e-07, 8.77143e-07, 8.16809e-07
SHA1, 5, 570, 5.67104, 0.00198533, 0.00199761, 0.00198967
SHA256, 5, 340, 5.67169, 0.00322694, 0.00361889, 0.00325424
SHA256_32b, 5, 4700000, 5.55119, 2.33977e-07, 2.40388e-07, 2.35284e-07
SHA512, 5, 330, 5.45279, 0.00328487, 0.00334672, 0.00329141
SipHash_32b, 5, 40000000, 5.66765, 2.82588e-08, 2.84166e-08, 2.83323e-08
Sleep100ms, 5, 10, 5.0036, 0.10007, 0.100075, 0.100071
Trig, 5, 12000000, 1.45462, 2.28395e-08, 2.80425e-08, 2.35008e-08
VerifyScriptBench, 5, 6300, 5.76116, 0.000181074, 0.000186656, 0.000182651

@jtimon
Copy link
Contributor

jtimon commented Dec 14, 2017

Concept ACK

@TheBlueMatt
Copy link
Contributor

Oh, also this is on a many-core system, so the checkqueue bench spawns a ton of threads (it uses a dynamic count based on hardware), making the bench even more inconsistent between systems.

@maflcko
Copy link
Member

maflcko commented Dec 19, 2017

@martinus I don't want to hold up your pull request any longer. I really like how flexible the new design is to jump in and hack own stuff into it.
However, since there seems to be issues arising from hardcoding the number of iterations, I'd propose the following solution:

Add a commit that also hardcodes the number of cores in checkqueue.cpp to 2. E.g.

diff --git a/src/bench/checkqueue.cpp b/src/bench/checkqueue.cpp
index e79c986792..55590c4a72 100644
--- a/src/bench/checkqueue.cpp
+++ b/src/bench/checkqueue.cpp
@@ -5,3 +5,2 @@
 #include <bench/bench.h>
-#include <util.h>
 #include <validation.h>
@@ -17,3 +16,3 @@
 // particularly visible
-static const int MIN_CORES = 2;
+static constexpr int NUM_CORES = 2;
 static const size_t BATCHES = 101;
@@ -33,4 +32,4 @@ static void CCheckQueueSpeed(benchmark::State& state)
     boost::thread_group tg;
-    for (auto x = 0; x < std::max(MIN_CORES, GetNumCores()); ++x) {
-       tg.create_thread([&]{queue.Thread();});
+    for (auto x = 0; x < NUM_CORES; ++x) {
+        tg.create_thread([&] { queue.Thread(); });
     }
@@ -80,4 +79,4 @@ static void CCheckQueueSpeedPrevectorJob(benchmark::State& state)
     boost::thread_group tg;
-    for (auto x = 0; x < std::max(MIN_CORES, GetNumCores()); ++x) {
-       tg.create_thread([&]{queue.Thread();});
+    for (auto x = 0; x < NUM_CORES; ++x) {
+        tg.create_thread([&] { queue.Thread(); });
     }

@maflcko
Copy link
Member

maflcko commented Dec 19, 2017

@TheBlueMatt suggested to get rid of the test, since it is not really a useful benchmark. I am fine with that, too.

(Make sure to add it as separate commit, so it is easier to review)

@@ -9,17 +9,64 @@
#include <validation.h>
#include <util.h>
#include <random.h>
#include <init.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this add a dependency on init.h?

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK. Needs rebase and two nits fixed.

@@ -15,9 +15,9 @@
import sys

FOLDER_GREP = 'src'
FOLDER_TEST = 'src/test/'
FOLDER_TEST = 'src/(test|bench)/'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the reason for this change. It hides a valid warning for me at least:

Args undocumented: 1
set(['-plot-height'])

<< HelpMessageOpt("-printer=(console|plot)", strprintf(_("Choose printer format. console: print data to console. plot: Print results as HTML graph (default: %s)"), DEFAULT_BENCH_PRINTER))
<< HelpMessageOpt("-plot-plotlyurl=<uri>", strprintf(_("URL to use for plotly.js (default: %s)"), DEFAULT_PLOT_PLOTLYURL))
<< HelpMessageOpt("-plot-width=<x>", strprintf(_("Plot width in pixel (default: %u)"), DEFAULT_PLOT_WIDTH))
<< HelpMessageOpt("-plot-width=<x>", strprintf(_("Plot height in pixel (default: %u)"), DEFAULT_PLOT_HEIGHT));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arg should say "-plot-height=" to make the python check happy.

@martinus
Copy link
Contributor Author

I think I made my branch kaputt while trying to rebase :-(

@maflcko
Copy link
Member

maflcko commented Dec 21, 2017 via email

@maflcko
Copy link
Member

maflcko commented Dec 21, 2017 via email

@@ -17,7 +17,7 @@
FOLDER_GREP = 'src'
FOLDER_TEST = 'src/test/'
CMD_ROOT_DIR = '`git rev-parse --show-toplevel`/%s' % FOLDER_GREP
CMD_GREP_ARGS = r"egrep -r -I '(map(Multi)?Args(\.count\(|\[)|Get(Bool)?Arg\()\"\-[^\"]+?\"' %s | grep -v '%s'" % (CMD_ROOT_DIR, FOLDER_TEST)
CMD_GREP_ARGS = r"egrep -r -I '(map(Multi)?Args(\.count\(|\[)|Get(Bool)?Arg\()\"\-[^\"]+?\"' %s | grep -Ev '%s'" % (CMD_ROOT_DIR, FOLDER_TEST)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to remove this as well

* inline performance critical code
* Average runtime is specified and used to calculate iterations.
* Console: show median of multiple runs
* plot: show box plot
* filter benchmarks
* specify scaling factor
* ignore src/test and src/bench in command line check script
* number of iterations instead of time
* Replaced runtime in BENCHMARK makro number of iterations.
* Added -? to bench_bitcoin
* Benchmark plotly.js URL, width, height can be customized
* Fixed incorrect precision warning
This benchmark's runtime was rather unpredictive on different machines, not really a useful benchmark.
@martinus martinus force-pushed the improved-benchmarking branch from e63f311 to 760af84 Compare December 23, 2017 10:04
@martinus
Copy link
Contributor Author

Thanks a lot @MarcoFalke!

@laanwj
Copy link
Member

laanwj commented Dec 23, 2017

utACK 760af84

@laanwj laanwj merged commit 760af84 into bitcoin:master Dec 23, 2017
laanwj added a commit that referenced this pull request Dec 23, 2017
760af84 Removed CCheckQueueSpeed benchmark (Martin Ankerl)
00721e6 Improved microbenchmarking with multiple features. (Martin Ankerl)

Pull request description:

  The benchmark's KeepRunning() used to make a function call for each call, inflating measurement times for short running code. This change inlines the critical code that is executed each run and moves the slow timer updates into a new function.

  This change increases the average runtime for Trig from 0.000000082339208 sec to 0.000000080948591.

Tree-SHA512: 36b3bc55fc9b1d4cbf526b7103af6af18e9783e6b8f3ad3adbd09fac0bf9401cfefad58fd1e6fa2615d3c4e677998f912f3323d61d7b00b1c660d581c257d577
@maflcko
Copy link
Member

maflcko commented Dec 23, 2017

Post-merge checked that the rebase was done correct:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

ACK 760af84072408ba53d009e868fccc25fb186d40c
-----BEGIN PGP SIGNATURE-----

iQIcBAEBCgAGBQJaPpFpAAoJENLqSFDnUosl6n4QAKHR/UGqeuxq0CJHWsaP2gX/
JIG2y99LkAdepKBIUAaHxwSiLjUVPFX+8m+v5sdgcUuTMzhnF/oto59WDG7UJ5RN
UTw+rGcZuFN8aBBcgtrLCKASjLVsAtImIESnx8uOgT+NQ2ShNtmwlbazNBmupx6e
yChi665E30Olw5w7CCDHPKttzZozp5ZoCCTjMlYChJC9F7fr9+jmgr4cLhHrRCIt
WtO874M6lzi45LJEPL6waLpV+DakCmss733iVwf9eb6aEvrJ6Ix2TSGiIRyLlarZ
MSCEBEHJbEmYOxQmZtGAuTd+9dZHR1vuEQiWB9wbJaggb9K+/7aBTsHd7/h/XEvr
DXb1wFiFmNjNytpqsIgYeOz1T3SK7vuLod9uFXSXx9TEc4/i91RuHZgQ9kBr1T8w
BrYBzSlSAziJ5fqKthfShj1C3X3Wpj9axcVAxFyPo1h+JbamVUDAl8B5afSPtkY3
h62TWy4n4KhYEqqBQfMi2UUNhyEvwR46MOVj9i3UrEemRnL/OtxnkitFa5mhqBi5
eye5MeRXgYvLSRRHUoErtx3W5wcpC+8DCF/CnHLKUiyvCWD74XDhhss7TRdePcjq
o103bXGqBvR8IjQT6WYqVZbkDkHBBIpqPrkEW/icp8MJLHsiqvI1ZR7eqXQxlHEC
nfaCXEa0qfx+Qvd5+Ozp
=siBw
-----END PGP SIGNATURE-----

@martinus martinus deleted the improved-benchmarking branch May 22, 2018 16:27
laanwj added a commit that referenced this pull request Jun 5, 2018
…st dependencies

81bbd32 build: Guard against accidental introduction of new Boost dependencies (practicalswift)

Pull request description:

  Guard against accidental introduction of new Boost dependencies.

  Context: #13383 – the usage of `boost::lexical_cast` was introduced in #11517 from December 2017

Tree-SHA512: 8d7b667ecf7ea62d84d9d41a71726f1e46c5a411b5a7db475c973ef364cac65609399afda7931e143a27d40c2947ff286e5e98ab263e8f0d225e2ae2c0872935
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 17, 2020
…new Boost dependencies

81bbd32 build: Guard against accidental introduction of new Boost dependencies (practicalswift)

Pull request description:

  Guard against accidental introduction of new Boost dependencies.

  Context: bitcoin#13383 – the usage of `boost::lexical_cast` was introduced in bitcoin#11517 from December 2017

Tree-SHA512: 8d7b667ecf7ea62d84d9d41a71726f1e46c5a411b5a7db475c973ef364cac65609399afda7931e143a27d40c2947ff286e5e98ab263e8f0d225e2ae2c0872935
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jul 2, 2020
…new Boost dependencies

81bbd32 build: Guard against accidental introduction of new Boost dependencies (practicalswift)

Pull request description:

  Guard against accidental introduction of new Boost dependencies.

  Context: bitcoin#13383 – the usage of `boost::lexical_cast` was introduced in bitcoin#11517 from December 2017

Tree-SHA512: 8d7b667ecf7ea62d84d9d41a71726f1e46c5a411b5a7db475c973ef364cac65609399afda7931e143a27d40c2947ff286e5e98ab263e8f0d225e2ae2c0872935
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jul 18, 2020
760af84 Removed CCheckQueueSpeed benchmark (Martin Ankerl)
00721e6 Improved microbenchmarking with multiple features. (Martin Ankerl)

Pull request description:

  The benchmark's KeepRunning() used to make a function call for each call, inflating measurement times for short running code. This change inlines the critical code that is executed each run and moves the slow timer updates into a new function.

  This change increases the average runtime for Trig from 0.000000082339208 sec to 0.000000080948591.

Tree-SHA512: 36b3bc55fc9b1d4cbf526b7103af6af18e9783e6b8f3ad3adbd09fac0bf9401cfefad58fd1e6fa2615d3c4e677998f912f3323d61d7b00b1c660d581c257d577
Signed-off-by: pasta <pasta@dashboost.org>

# Conflicts:
#	src/bench/bench.cpp
#	src/bench/bench_dash.cpp
#	src/bench/crypto_hash.cpp
#	src/bench/prevector_destructor.cpp
#	src/bench/verify_script.cpp
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jul 19, 2020
760af84 Removed CCheckQueueSpeed benchmark (Martin Ankerl)
00721e6 Improved microbenchmarking with multiple features. (Martin Ankerl)

Pull request description:

  The benchmark's KeepRunning() used to make a function call for each call, inflating measurement times for short running code. This change inlines the critical code that is executed each run and moves the slow timer updates into a new function.

  This change increases the average runtime for Trig from 0.000000082339208 sec to 0.000000080948591.

Tree-SHA512: 36b3bc55fc9b1d4cbf526b7103af6af18e9783e6b8f3ad3adbd09fac0bf9401cfefad58fd1e6fa2615d3c4e677998f912f3323d61d7b00b1c660d581c257d577
Signed-off-by: pasta <pasta@dashboost.org>

# Conflicts:
#	src/bench/bench.cpp
#	src/bench/bench_dash.cpp
#	src/bench/crypto_hash.cpp
#	src/bench/prevector_destructor.cpp
#	src/bench/verify_script.cpp
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jul 19, 2020
760af84 Removed CCheckQueueSpeed benchmark (Martin Ankerl)
00721e6 Improved microbenchmarking with multiple features. (Martin Ankerl)

Pull request description:

  The benchmark's KeepRunning() used to make a function call for each call, inflating measurement times for short running code. This change inlines the critical code that is executed each run and moves the slow timer updates into a new function.

  This change increases the average runtime for Trig from 0.000000082339208 sec to 0.000000080948591.

Tree-SHA512: 36b3bc55fc9b1d4cbf526b7103af6af18e9783e6b8f3ad3adbd09fac0bf9401cfefad58fd1e6fa2615d3c4e677998f912f3323d61d7b00b1c660d581c257d577
Signed-off-by: pasta <pasta@dashboost.org>

# Conflicts:
#	src/bench/bench.cpp
#	src/bench/bench_dash.cpp
#	src/bench/crypto_hash.cpp
#	src/bench/prevector_destructor.cpp
#	src/bench/verify_script.cpp
UdjinM6 added a commit to dashpay/dash that referenced this pull request Jul 20, 2020
* Merge bitcoin#11517: Tests: Improve benchmark precision

760af84 Removed CCheckQueueSpeed benchmark (Martin Ankerl)
00721e6 Improved microbenchmarking with multiple features. (Martin Ankerl)

Pull request description:

  The benchmark's KeepRunning() used to make a function call for each call, inflating measurement times for short running code. This change inlines the critical code that is executed each run and moves the slow timer updates into a new function.

  This change increases the average runtime for Trig from 0.000000082339208 sec to 0.000000080948591.

Tree-SHA512: 36b3bc55fc9b1d4cbf526b7103af6af18e9783e6b8f3ad3adbd09fac0bf9401cfefad58fd1e6fa2615d3c4e677998f912f3323d61d7b00b1c660d581c257d577
Signed-off-by: pasta <pasta@dashboost.org>

# Conflicts:
#	src/bench/bench.cpp
#	src/bench/bench_dash.cpp
#	src/bench/crypto_hash.cpp
#	src/bench/prevector_destructor.cpp
#	src/bench/verify_script.cpp

* More of 11517

Co-authored-by: Wladimir J. van der Laan <laanwj@gmail.com>
Co-authored-by: UdjinM6 <UdjinM6@users.noreply.github.com>
gades pushed a commit to cosanta/cosanta-core that referenced this pull request Jun 30, 2021
* Merge bitcoin#11517: Tests: Improve benchmark precision

760af84 Removed CCheckQueueSpeed benchmark (Martin Ankerl)
00721e6 Improved microbenchmarking with multiple features. (Martin Ankerl)

Pull request description:

  The benchmark's KeepRunning() used to make a function call for each call, inflating measurement times for short running code. This change inlines the critical code that is executed each run and moves the slow timer updates into a new function.

  This change increases the average runtime for Trig from 0.000000082339208 sec to 0.000000080948591.

Tree-SHA512: 36b3bc55fc9b1d4cbf526b7103af6af18e9783e6b8f3ad3adbd09fac0bf9401cfefad58fd1e6fa2615d3c4e677998f912f3323d61d7b00b1c660d581c257d577
Signed-off-by: pasta <pasta@dashboost.org>

* More of 11517

Co-authored-by: Wladimir J. van der Laan <laanwj@gmail.com>
Co-authored-by: UdjinM6 <UdjinM6@users.noreply.github.com>
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants