speed up Unserialize_impl for prevector #12324

AkioNak · 2018-02-01T11:51:56Z

The unserializer for prevector uses resize() for reserve the area, but it's prefer to use reserve() because resize() have overhead to call its constructor many times.

However, reserve() does not change the value of _size (a private member of prevector).

This PR make the logic of read from stream to callback function, and prevector handles initilizing new values with that call-back and ajust the value of _size.

The changes are as follows:

prevector.h
Add a public member function named 'append'.
This function has 2 params, number of elemenst to append and call-back function that initilizing new appended values.
serialize.h
In the following two function:

Unserialize_impl(Stream& is, prevector<N, T>& v, const unsigned char&)
Unserialize_impl(Stream& is, prevector<N, T>& v, const V&)
Make a callback function from each original logic of reading values from stream, and call prevector's append().

test/prevector_tests.cpp
Add a test for append().

A benchmark result is following:

[Machine]
MacBook Pro (macOS 10.13.3/i7 2.2GHz/mem 16GB/SSD)

[result]
DeserializeAndCheckBlockTest => 22% faster
DeserializeBlockTest => 29% faster

[before PR]
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 60, 160, 94.4901, 0.0094644, 0.0104715, 0.0098339
DeserializeBlockTest, 60, 130, 65.0964, 0.00800362, 0.00895134, 0.00824187

[After PR]
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 60, 160, 77.1597, 0.00767013, 0.00858959, 0.00805757
DeserializeBlockTest, 60, 130, 49.9443, 0.00613926, 0.00691187, 0.00635527

fanquake · 2018-02-01T11:56:15Z

@AkioNak You might also want to look at #10785.

AkioNak · 2018-02-01T14:18:45Z

@fanquake Thank you for pointing to #10785.
I will check if this PR is still useful even if #10785 is merged.

laanwj · 2018-02-01T18:43:33Z

Thanks for adding benchmarks! That's the way to do optimization PRs.

promag

Please squash.

promag · 2018-02-06T14:25:18Z

src/test/prevector_tests.cpp

@@ -183,6 +183,20 @@ class prevector_tester {
        pre_vector = pre_vector_alt;
    }

+    void append(realtype values) {
+        for(auto v : values) {


Nit, space after for.

promag · 2018-02-06T14:25:29Z

src/test/prevector_tests.cpp

+        }
+        auto p = pre_vector.size();
+        auto f = [&]() {
+            for(auto v : values) {


Nit, space after for.

AkioNak · 2018-02-07T06:54:49Z

@promag Thank you for your review. I fixed them and squashed commits.

AkioNak · 2018-02-07T10:19:50Z

@fanquake Fortunately, I think that there was no collision or adverse effect between #10785 and #12324. Also, #12324 still usefull even if #10785 has been merged.

Confirmation summary:

enviroment : MacBook Pro (macOS 10.13.3/i7 2.2GHz/mem 16GB/SSD)
merge - git merge (master d3e4675 + both Serialization improvements #10785 and speed up Unserialize_impl for prevector #12324) : succeed.
build - make clean && make : succeed.
test - test_runner.py : passed. (exclude wallet_encription.py)
benchmark
[result]
DeserializeAndCheckBlockTest => 25% faster
DeserializeBlockTest => 30% faster

[#10785]
 # Benchmark, evals, iterations, total, min, max, median
 DeserializeAndCheckBlockTest, 50, 160, 76.7465, 0.00941822, 0.00986061, 0.00958263
 DeserializeBlockTest, 50, 130, 52.3447, 0.00791727, 0.00828939, 0.00805472

[#10785 + #12324]
 # Benchmark, evals, iterations, total, min, max, median
 DeserializeAndCheckBlockTest, 50, 160, 61.3164, 0.00750302, 0.00797864, 0.00765575
 DeserializeBlockTest, 50, 130, 40.1209, 0.00602097, 0.0063615, 0.00617751

eklitzke · 2018-03-11T02:54:47Z

Concept ACK. I like the idea of making this faster (I've seen this taking a lot of time in my profiles), but using a template to pass a lambda seems unnecessarily complex. If it gets inlined it will be fast. But if it ends up not being inlined, the lambda will turn into a full std::function object (since it captures) and that will be slow.

Can you either:

Remove the func template (it's only called in two places...)
Or check that GCC 4.8 inlines the lambda properly?

AkioNak · 2018-03-11T13:37:13Z

@eklitzke thank you for your comment. I will try it.

kallewoof · 2018-03-13T07:16:02Z

I like the idea of making this faster (I've seen this taking a lot of time in my profiles), but using a template to pass a lambda seems unnecessarily complex. If it gets inlined it will be fast. But if it ends up not being inlined, the lambda will turn into a full std::function object (since it captures) and that will be slow.

Adding inline to the template should do the trick, I think.

But I agree that lambdas and callbacks are a bit complex here. I personally think a caveat note on a method in prevector that leaves garbage in the vector is fine, e.g.

/**
 * Grow the size of the prevector by b bytes.
 * NOTE: The added capacity must be overwritten, or it will contain garbage data.
 */

sipa · 2018-03-17T00:30:27Z

Agree with @kallewoof. It seems the goal of using a callback here is to avoid having a public method that brings the vector in a (partially) undefined state. However, the result is that now we have a callback that needs to run in this state.

I would either:

change the method name to resize_uninitialized or so, and initialize explicitly after it returns
pass a begin and end iterator to the callback (avoiding the need for the code in the callback to interact with the prevector while it's in an undefined state).

AkioNak · 2018-03-19T07:53:55Z

@eklitzke @kallewoof @sipa Thank you for suggestions.
I introduced resize_uninitialized() and explicitly initialized instead of lambdas and callbacks.

eklitzke

This looks good! Just some typos in the comments.

eklitzke · 2018-03-19T06:50:44Z

src/prevector.h

@@ -382,6 +382,20 @@ class prevector {
        }
    }

+    inline void resize_uninitialized(size_type new_size) {
+        // resize_uninitialized change the size of the prevector but dose not initialize.


not: s/dose/does/

eklitzke · 2018-03-19T06:51:01Z

src/prevector.h

@@ -382,6 +382,20 @@ class prevector {
        }
    }

+    inline void resize_uninitialized(size_type new_size) {
+        // resize_uninitialized change the size of the prevector but dose not initialize.
+        // If size < new_size, the added elements must be initialized explicitly after it return.


nit: s/return/returns/

AkioNak · 2018-03-20T10:15:03Z

@eklitzke Thank you for your pointing out for my typos. Fixed them.

eklitzke · 2018-03-21T04:26:01Z

This looks good, although you still need to squash. I'm curious: do you still see the speedup from your intial benchmark? I know we changed other logic in this file since then.

On master:

$ ./src/bench/bench_bitcoin -filter='Deser.*' --evals=10
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 10, 160, 9.86197, 0.00603752, 0.00640901, 0.00616365
DeserializeBlockTest, 10, 130, 6.7487, 0.00486321, 0.0065349, 0.00496556

With your branch:

$ ./src/bench/bench_bitcoin -filter='Deser.*' --evals=10
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 10, 160, 9.96894, 0.00611369, 0.00647763, 0.00622066
DeserializeBlockTest, 10, 130, 6.43537, 0.00484567, 0.00539326, 0.00490343

maflcko · 2018-03-21T12:52:05Z

Could make sense to squash and rebase on master to ease benchmarking?

AkioNak · 2018-03-21T17:54:39Z

Squashed and rebased.
Now, speed up is still exist but a little (2.06% - 3.35%).

my enviroment : iMac late 2013 (macOS 10.13.3/i5 2.9GHz/mem 16GB/SSD)
[on master]

# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 1000, 160, 1169.53, 0.00713881, 0.00842435, 0.00718528
DeserializeBlockTest, 1000, 130, 756.064, 0.00574035, 0.006433, 0.00575841

[my PR]

# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 1000, 160, 1131.6, 0.0070475, 0.00744498, 0.00706667
DeserializeBlockTest, 1000, 130, 740.836, 0.00567827, 0.00600557, 0.00569649

sipa · 2018-03-21T18:20:46Z

utACK d85530db45b327eecf408bc8e9636fa60e886208

eklitzke · 2018-03-22T00:59:10Z

Thanks for checking. utACK d85530db45b327eecf408bc8e9636fa60e886208

AkioNak · 2018-07-07T15:42:30Z

Rebased and add a bench.
This benchmark measures the part specialized for unserialization.

PrevectorDeserializeNontrivial => 3% faster
PrevectorDeserializeTrivial => 24% faster

my enviroment : iMac late 2013 (macOS 10.13.3/i5 2.9GHz/mem 16GB/SSD)

[on master ] commit 0212187

#1
# Benchmark, evals, iterations, total, min, max, median
PrevectorDeserializeNontrivial, 5, 6800, 10.3091, 0.00030134, 0.000308502, 0.000301828
PrevectorDeserializeTrivial, 5, 52000, 6.21001, 2.3817e-05, 2.39649e-05, 2.387e-05

#2
# Benchmark, evals, iterations, total, min, max, median
PrevectorDeserializeNontrivial, 5, 6800, 10.2418, 0.000300878, 0.000301709, 0.000301096
PrevectorDeserializeTrivial, 5, 52000, 6.2122, 2.38438e-05, 2.39243e-05, 2.39157e-05

#3
# Benchmark, evals, iterations, total, min, max, median
PrevectorDeserializeNontrivial, 5, 6800, 10.2464, 0.000300893, 0.000301853, 0.000301164
PrevectorDeserializeTrivial, 5, 52000, 6.20238, 2.38275e-05, 2.3929e-05, 2.38356e-05

[my PR] commit 23afe7acfa7908905e826f09601c9564ff685be0

#1
# Benchmark, evals, iterations, total, min, max, median
PrevectorDeserializeNontrivial, 5, 6800, 9.95062, 0.000292415, 0.000292987, 0.000292503
PrevectorDeserializeTrivial, 5, 52000, 4.99719, 1.9179e-05, 1.92648e-05, 1.92245e-05

#2
# Benchmark, evals, iterations, total, min, max, median
PrevectorDeserializeNontrivial, 5, 6800, 9.94901, 0.000292216, 0.000292962, 0.00029258
PrevectorDeserializeTrivial, 5, 52000, 4.99091, 1.91576e-05, 1.92298e-05, 1.92036e-05

#3
# Benchmark, evals, iterations, total, min, max, median
PrevectorDeserializeNontrivial, 5, 6800, 9.94274, 0.000292245, 0.00029272, 0.000292385
PrevectorDeserializeTrivial, 5, 52000, 4.99286, 1.91848e-05, 1.92303e-05, 1.92037e-05

maflcko · 2018-07-08T06:58:14Z

Rebased and add a bench.

Could add the bench in a separate commit/pull request to make it easier to check for the speedup.

AkioNak · 2018-07-08T12:22:18Z

@MarcoFalke Thank you for your suggestion.
Separated this new benchmark from the original commit.

AkioNak · 2018-07-09T04:31:49Z

Re-orderd commits.
First commit (ee9867c) is adding a benchmark fucntion.
Second one (f9083e5) is refactoring(speed up) and tests.

46340b3 [bench] Add benchmark for unserialize prevector (Akio Nakamura) Pull request description: This PR adds benchmarks for the unserialization of the prevector. Note: Separated from #12324. Tree-SHA512: c055a283328cc2634c01eb60f26604a8665939bbf77d367b6ba6b4e01e77d4511fab69cc3ddb1e62969adb3c48752ed870f45ceba153eee192302601341e18a7

AkioNak · 2018-07-29T13:41:40Z

@kallewoof I understand your concern of slow-down caused from randomness.
But I don't worry so much because of deterministic psued random introduce by #10321.
It may be possible to remove randomness from this test (or more widely), but I think it would be better to use a different PR(s).

kallewoof · 2018-07-30T03:55:20Z

Cool about #10321, didn't realize that change was made.

utACK b91962ecf0a9b90c989068e3f12e5699bc90ef6f

[Edited to fix commit reference.]

src/test/prevector_tests.cpp

sipa · 2018-10-12T20:06:39Z

utACK b91962ecf0a9b90c989068e3f12e5699bc90ef6f

shahzadlone · 2019-02-01T07:32:23Z

src/test/prevector_tests.cpp

+    void resize_uninitialized(realtype values) {
+        size_t s = real_vector.size() / 2;
+        real_vector.resize(s);
+        pre_vector.resize_uninitialized(s);


Perhaps here you could make sure before going in loop to reserve values.size() many more memory to optimize the vector.

@shahzadlone Thank you for your review.
Addressed that you pointed out.

The unserializer for prevector uses resize() for reserve the area, but it's prefer to use reserve() because resize() have overhead to call its constructor many times. However, reserve() does not change the value of "_size" (a private member of prevector). This PR introduce resize_uninitialized() to prevector that similar to resize() but does not call constructor, and added elements are explicitly initialized in Unserialize_imple(). The changes are as follows: 1. prevector.h Add a public member function named 'resize_uninitialized'. This function processes like as resize() but does not call constructors. So added elemensts needs explicitly initialized after this returns. 2. serialize.h In the following two function: Unserialize_impl(Stream& is, prevector<N, T>& v, const unsigned char&) Unserialize_impl(Stream& is, prevector<N, T>& v, const V&) Calls resize_uninitialized() instead of resize() 3. test/prevector_tests.cpp Add a test for resize_uninitialized().

laanwj · 2019-06-18T15:10:06Z

utACK 86b47fa

86b47fa speed up Unserialize_impl for prevector (Akio Nakamura) Pull request description: The unserializer for prevector uses `resize()` for reserve the area, but it's prefer to use `reserve()` because `resize()` have overhead to call its constructor many times. However, `reserve()` does not change the value of `_size` (a private member of prevector). This PR make the logic of read from stream to callback function, and prevector handles initilizing new values with that call-back and ajust the value of `_size`. The changes are as follows: 1. prevector.h Add a public member function named 'append'. This function has 2 params, number of elemenst to append and call-back function that initilizing new appended values. 2. serialize.h In the following two function: - `Unserialize_impl(Stream& is, prevector<N, T>& v, const unsigned char&)` - `Unserialize_impl(Stream& is, prevector<N, T>& v, const V&)` Make a callback function from each original logic of reading values from stream, and call prevector's `append()`. 3. test/prevector_tests.cpp Add a test for `append()`. ## A benchmark result is following: [Machine] MacBook Pro (macOS 10.13.3/i7 2.2GHz/mem 16GB/SSD) [result] DeserializeAndCheckBlockTest => 22% faster DeserializeBlockTest => 29% faster [before PR] # Benchmark, evals, iterations, total, min, max, median DeserializeAndCheckBlockTest, 60, 160, 94.4901, 0.0094644, 0.0104715, 0.0098339 DeserializeBlockTest, 60, 130, 65.0964, 0.00800362, 0.00895134, 0.00824187 [After PR] # Benchmark, evals, iterations, total, min, max, median DeserializeAndCheckBlockTest, 60, 160, 77.1597, 0.00767013, 0.00858959, 0.00805757 DeserializeBlockTest, 60, 130, 49.9443, 0.00613926, 0.00691187, 0.00635527 ACKs for top commit: laanwj: utACK 86b47fa Tree-SHA512: 62ea121ccd45a306fefc67485a1b03a853435af762607dae2426a87b15a3033d802c8556e1923727ddd1023a1837d0e5f6720c2c77b38196907e750e15fbb902

maflcko · 2019-06-19T19:26:24Z

Has anyone checked that this actually improves the benchmark as claimed in the OP. It does not for me.

kallewoof · 2019-06-20T05:11:27Z

I ran the benchmarks on two linux machines (one pretty powerful (GCO) and one not so (Lefty)), and a MacBook Pro. Raw numbers at bottom. I see improvements in master compared to e2182b0 in PrevectorDeserialize*rivial, but not in the other benchmarks:

(PrevectorDeserializeNontrivial)

(PrevectorDeserializeTrivial)

(Sorry, I wasn't sure how to make graphs for benchmarks. Raw data below.)

git checkout e2182b02b

MacBook Pro

$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 6.38381, 0.00792733, 0.00802109, 0.00799184
DeserializeBlockTest, 5, 130, 4.23839, 0.00634135, 0.00694236, 0.00644537
PrevectorDeserializeNontrivial, 5, 6800, 13.3873, 0.000390055, 0.000397128, 0.000393687
PrevectorDeserializeTrivial, 5, 52000, 7.96211, 2.98404e-05, 3.17966e-05, 3.04728e-05
$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 7.26932, 0.00838346, 0.00972242, 0.00942287
DeserializeBlockTest, 5, 130, 4.31697, 0.00627366, 0.00738568, 0.0063276
PrevectorDeserializeNontrivial, 5, 6800, 13.2609, 0.000386639, 0.000394633, 0.000387913
PrevectorDeserializeTrivial, 5, 52000, 6.47209, 2.45058e-05, 2.51857e-05, 2.49892e-05
$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 6.32675, 0.00777852, 0.00802965, 0.00789037
DeserializeBlockTest, 5, 130, 4.20692, 0.00634361, 0.00674539, 0.00637073
PrevectorDeserializeNontrivial, 5, 6800, 13.3495, 0.000390075, 0.00039594, 0.000391467
PrevectorDeserializeTrivial, 5, 52000, 6.39515, 2.44061e-05, 2.48503e-05, 2.45071e-05
$

Ubuntu Linux (gco)

$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 3.66615, 0.0045421, 0.00466559, 0.00455856
DeserializeBlockTest, 5, 130, 2.46865, 0.00379088, 0.00381237, 0.00379707
PrevectorDeserializeNontrivial, 5, 6800, 1.98757, 5.78348e-05, 6.06301e-05, 5.79352e-05
PrevectorDeserializeTrivial, 5, 52000, 2.46781, 9.44648e-06, 9.56601e-06, 9.48185e-06
$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 3.66454, 0.00455685, 0.00465467, 0.00456468
DeserializeBlockTest, 5, 130, 2.48794, 0.00379826, 0.00392414, 0.00380655
PrevectorDeserializeNontrivial, 5, 6800, 1.99268, 5.78273e-05, 6.03986e-05, 5.83746e-05
PrevectorDeserializeTrivial, 5, 52000, 2.49502, 9.40815e-06, 9.87399e-06, 9.63512e-06
$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 3.68505, 0.00456129, 0.00469898, 0.00458263
DeserializeBlockTest, 5, 130, 2.47728, 0.00378338, 0.00390047, 0.00378849
PrevectorDeserializeNontrivial, 5, 6800, 1.98537, 5.76469e-05, 6.05225e-05, 5.77919e-05
PrevectorDeserializeTrivial, 5, 52000, 2.46844, 9.41589e-06, 9.66261e-06, 9.43065e-06
$


Ubuntu Linux (lefty)

$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 4.026, 0.00500104, 0.00506093, 0.00503687
DeserializeBlockTest, 5, 130, 2.75244, 0.00420178, 0.00429237, 0.00423144
PrevectorDeserializeNontrivial, 5, 6800, 2.1691, 6.35412e-05, 6.42218e-05, 6.3725e-05
PrevectorDeserializeTrivial, 5, 52000, 2.74612, 1.05157e-05, 1.05976e-05, 1.05714e-05
$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 4.06636, 0.0050483, 0.00515482, 0.00507011
DeserializeBlockTest, 5, 130, 2.78506, 0.00421478, 0.00449306, 0.00424098
PrevectorDeserializeNontrivial, 5, 6800, 2.18859, 6.39355e-05, 6.50351e-05, 6.42038e-05
PrevectorDeserializeTrivial, 5, 52000, 2.77944, 1.05288e-05, 1.10395e-05, 1.06421e-05
$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 4.02973, 0.00501044, 0.00507903, 0.00502341
DeserializeBlockTest, 5, 130, 2.77102, 0.00423808, 0.00430167, 0.00425302
PrevectorDeserializeNontrivial, 5, 6800, 2.21241, 6.38159e-05, 6.64756e-05, 6.51687e-05
PrevectorDeserializeTrivial, 5, 52000, 2.76094, 1.05595e-05, 1.07031e-05, 1.06016e-05
$

git checkout master (44d81723236114f9370f386f3b3310477a6dde43)

MacBook Pro

$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 6.17814, 0.0076679, 0.00785118, 0.00768925
DeserializeBlockTest, 5, 130, 4.06245, 0.00622414, 0.00628035, 0.00624402
PrevectorDeserializeNontrivial, 5, 6800, 13.073, 0.000382798, 0.000387096, 0.00038391
PrevectorDeserializeTrivial, 5, 52000, 5.15493, 1.96871e-05, 1.99912e-05, 1.9814e-05
$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 6.1623, 0.00761647, 0.00792896, 0.00765533
DeserializeBlockTest, 5, 130, 4.445, 0.00621718, 0.00774078, 0.00645273
PrevectorDeserializeNontrivial, 5, 6800, 14.5375, 0.000414608, 0.00044193, 0.000425773
PrevectorDeserializeTrivial, 5, 52000, 5.79235, 1.98752e-05, 2.50009e-05, 2.20261e-05
$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 6.22612, 0.00771692, 0.00791803, 0.00773467
DeserializeBlockTest, 5, 130, 4.09782, 0.00627166, 0.00638606, 0.00628699
PrevectorDeserializeNontrivial, 5, 6800, 13.0198, 0.000381592, 0.000384941, 0.000382887
PrevectorDeserializeTrivial, 5, 52000, 5.17657, 1.98519e-05, 2.00462e-05, 1.98927e-05
$

Linux Ubuntu (gco)

$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 3.71635, 0.00461716, 0.00470119, 0.00462609
DeserializeBlockTest, 5, 130, 2.51346, 0.00384798, 0.0039135, 0.00385253
PrevectorDeserializeNontrivial, 5, 6800, 1.64983, 4.74319e-05, 5.09874e-05, 4.79242e-05
PrevectorDeserializeTrivial, 5, 52000, 1.66145, 6.2768e-06, 6.69577e-06, 6.32143e-06
$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 3.6407, 0.00452042, 0.00463718, 0.00453418
DeserializeBlockTest, 5, 130, 2.46214, 0.00373507, 0.00390495, 0.00376771
PrevectorDeserializeNontrivial, 5, 6800, 1.64648, 4.78473e-05, 5.03235e-05, 4.79951e-05
PrevectorDeserializeTrivial, 5, 52000, 1.67024, 6.32431e-06, 6.73776e-06, 6.3539e-06
$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 3.77728, 0.00471439, 0.00474315, 0.00471631
DeserializeBlockTest, 5, 130, 2.49195, 0.00381892, 0.00388194, 0.00382307
PrevectorDeserializeNontrivial, 5, 6800, 1.68265, 4.85277e-05, 5.10689e-05, 4.96072e-05
PrevectorDeserializeTrivial, 5, 52000, 1.68869, 6.41578e-06, 6.60289e-06, 6.50186e-06
$

Ubuntu Linux (lefty)

$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 4.05288, 0.00505502, 0.00507194, 0.00506818
DeserializeBlockTest, 5, 130, 2.80491, 0.0042465, 0.00449445, 0.00427763
PrevectorDeserializeNontrivial, 5, 6800, 1.86039, 5.34824e-05, 5.64816e-05, 5.4309e-05
PrevectorDeserializeTrivial, 5, 52000, 1.88765, 7.17239e-06, 7.30816e-06, 7.28207e-06
$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 3.97765, 0.0049599, 0.00499381, 0.00496968
DeserializeBlockTest, 5, 130, 2.70895, 0.00415038, 0.00417582, 0.00417205
PrevectorDeserializeNontrivial, 5, 6800, 1.80413, 5.28583e-05, 5.33843e-05, 5.30097e-05
PrevectorDeserializeTrivial, 5, 52000, 1.82675, 6.98311e-06, 7.08135e-06, 7.019e-06
$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 4.03341, 0.00500616, 0.00510331, 0.0050347
DeserializeBlockTest, 5, 130, 2.74251, 0.00420575, 0.0042337, 0.00421798
PrevectorDeserializeNontrivial, 5, 6800, 1.80552, 5.28383e-05, 5.37108e-05, 5.29898e-05
PrevectorDeserializeTrivial, 5, 52000, 1.8427, 6.96723e-06, 7.47701e-06, 7.0036e-06
$

AkioNak · 2019-06-20T10:40:46Z

@MarcoFalke @kallewoof Thank you for the comment and reporting the benchmark result.

The #12549, which is an improvement to prevector and merged on 1 Mar 2018, is very efficient, so the improvement of this PR is relatively hidden in the benchmark of Deserialize*BlockTest.
However, focusing on prevector deserialization itself, there are some speedup with this PR.

So I had have to change the PR description to PrevectorDeserializerivial instead of DeserializeBlockTest to clarify where is improved.

jamesob · 2019-06-24T14:55:06Z

Can verify I'm seeing speedups in this branch (as rebased onto master) relative to the commit that came before its merge into master, but only for gcc:

`e2182b0` vs. unserialize (relative)

bench name	x	`e2182b0`	unserialize
micro.gcc.PrevectorDeserializeNontrivial.total_secs	3	1.143	1.000
micro.gcc.PrevectorDeserializeTrivial.total_secs	3	1.333	1.000
micro.gcc.PrevectorDestructorNontrivial.total_secs	3	1.018	1.000
micro.clang.PrevectorDeserializeNontrivial.total_secs	3	1.000	1.020
micro.clang.PrevectorDeserializeTrivial.total_secs	3	1.106	1.000
micro.clang.PrevectorDestructorNontrivial.total_secs	3	1.000	1.019
micro.clang.PrevectorDestructorTrivial.total_secs	3	1.000	1.003

These changes do not notably affect IBD time though.

86b47fa speed up Unserialize_impl for prevector (Akio Nakamura) Pull request description: The unserializer for prevector uses `resize()` for reserve the area, but it's prefer to use `reserve()` because `resize()` have overhead to call its constructor many times. However, `reserve()` does not change the value of `_size` (a private member of prevector). This PR make the logic of read from stream to callback function, and prevector handles initilizing new values with that call-back and ajust the value of `_size`. The changes are as follows: 1. prevector.h Add a public member function named 'append'. This function has 2 params, number of elemenst to append and call-back function that initilizing new appended values. 2. serialize.h In the following two function: - `Unserialize_impl(Stream& is, prevector<N, T>& v, const unsigned char&)` - `Unserialize_impl(Stream& is, prevector<N, T>& v, const V&)` Make a callback function from each original logic of reading values from stream, and call prevector's `append()`. 3. test/prevector_tests.cpp Add a test for `append()`. ## A benchmark result is following: [Machine] MacBook Pro (macOS 10.13.3/i7 2.2GHz/mem 16GB/SSD) [result] DeserializeAndCheckBlockTest => 22% faster DeserializeBlockTest => 29% faster [before PR] # Benchmark, evals, iterations, total, min, max, median DeserializeAndCheckBlockTest, 60, 160, 94.4901, 0.0094644, 0.0104715, 0.0098339 DeserializeBlockTest, 60, 130, 65.0964, 0.00800362, 0.00895134, 0.00824187 [After PR] # Benchmark, evals, iterations, total, min, max, median DeserializeAndCheckBlockTest, 60, 160, 77.1597, 0.00767013, 0.00858959, 0.00805757 DeserializeBlockTest, 60, 130, 49.9443, 0.00613926, 0.00691187, 0.00635527 ACKs for top commit: laanwj: utACK 86b47fa Tree-SHA512: 62ea121ccd45a306fefc67485a1b03a853435af762607dae2426a87b15a3033d802c8556e1923727ddd1023a1837d0e5f6720c2c77b38196907e750e15fbb902

46340b3 [bench] Add benchmark for unserialize prevector (Akio Nakamura) Pull request description: This PR adds benchmarks for the unserialization of the prevector. Note: Separated from bitcoin#12324. Tree-SHA512: c055a283328cc2634c01eb60f26604a8665939bbf77d367b6ba6b4e01e77d4511fab69cc3ddb1e62969adb3c48752ed870f45ceba153eee192302601341e18a7

Summary: > The unserializer for prevector uses resize() for reserve the area, > but it's prefer to use reserve() because resize() have overhead > to call its constructor many times. > > However, reserve() does not change the value of "_size" > (a private member of prevector). > > This PR introduce resize_uninitialized() to prevector that similar to > resize() but does not call constructor, and added elements are > explicitly initialized in Unserialize_imple(). > > The changes are as follows: > 1. prevector.h > Add a public member function named 'resize_uninitialized'. > This function processes like as resize() but does not call constructors. > So added elemensts needs explicitly initialized after this returns. > > 2. serialize.h > In the following two function: > Unserialize_impl(Stream& is, prevector<N, T>& v, const unsigned char&) > Unserialize_impl(Stream& is, prevector<N, T>& v, const V&) > Calls resize_uninitialized() instead of resize() > > 3. test/prevector_tests.cpp > Add a test for resize_uninitialized(). Benchmark details provided in the PR discussion: > [Machine] > MacBook Pro (macOS 10.13.3/i7 2.2GHz/mem 16GB/SSD) > > [result] > DeserializeAndCheckBlockTest => 22% faster > DeserializeBlockTest => 29% faster This is a backport of Core [[bitcoin/bitcoin#12324 | PR12324]] Test Plan: `ninja all check-all` Reviewers: #bitcoin_abc, majcosta Reviewed By: #bitcoin_abc, majcosta Differential Revision: https://reviews.bitcoinabc.org/D8367

b886a4c Fix header guards using reserved identifiers (Dan Raviv) b389b3f speed up Unserialize_impl for prevector (Akio Nakamura) 13d2102 Minimal fix to slow prevector tests as stopgap measure (Jeremy Rubin) Pull request description: Backports bitcoin#12324. The `DeserializeAndCheckBlock` benchmark (introduced in #2146) shows a speedup of about 4% (not as much as the upstream PR, due to the optimizations already included in #2083). Cherry-picks also - bitcoin#8671 (with minimal changes to the random context, due to bitcoin#8914 and bitcoin#9792 being already ported out of order). - bitcoin#11151 ACKs for top commit: Fuzzbawls: utACK b886a4c furszy: utACK b886a4c and merging.. Tree-SHA512: f5de40e5acfb0b875d413d8995d71dd90489730fe4853f0be03d76a1c44ec95eaeb28c0c40d8e91906f23529ad26501bda4f9779ce466cd8603ed97f1662ca98

46340b3 [bench] Add benchmark for unserialize prevector (Akio Nakamura) Pull request description: This PR adds benchmarks for the unserialization of the prevector. Note: Separated from bitcoin#12324. Tree-SHA512: c055a283328cc2634c01eb60f26604a8665939bbf77d367b6ba6b4e01e77d4511fab69cc3ddb1e62969adb3c48752ed870f45ceba153eee192302601341e18a7

Summary === This is a backport of bitcoin/bitcoin#12324 The unserializer for prevector uses `resize()` for reserving the area, but it's preferred to use `reserve()` because `resize()` has overhead to call its constructor many times. However, `reserve()` does not change the value of `_size` (a private member of prevector). This PR introduce `resize_uninitialized()` to prevector that similar to `resize()` but does not call constructor, and added elements are explicitly initialized in `Unserialize_impl()`. The changes are as follows: 1. prevector.h Add a public member function named `resize_uninitialized`. This function processes like as `resize()` but does not call constructors. So added elemensts needs explicitly initialized after this returns. 2. serialize.h In the following two function: `Unserialize_impl(Stream &is, prevector<N, T> &v, const uint8_t &)` `Unserialize_impl(Stream &is, prevector<N, T> &v, const V &)` Calls `resize_uninitialized()` instead of `resize()` 3. test/prevector_tests.cpp Add a test for `resize_uninitialized()`. Benchmark === Before --- DeserializeAndCheckBlockTest_1MB, 10, 130, 13.9896, 0.0100612, 0.011033, 0.0109143 DeserializeAndCheckBlockTest_32MB, 10, 2, 13.0196, 0.614051, 0.705692, 0.64535 DeserializeBlockTest_1MB, 10, 160, 13.2404, 0.00795686, 0.00876105, 0.00827999 DeserializeBlockTest_32MB, 10, 3, 12.1832, 0.392285, 0.432704, 0.40708 After --- DeserializeAndCheckBlockTest_1MB, 10, 130, 13.5135, 0.0100882, 0.0110394, 0.0104667 DeserializeAndCheckBlockTest_32MB, 10, 2, 12.47, 0.601044, 0.655857, 0.624506 DeserializeBlockTest_1MB, 10, 160, 13.2542, 0.00799794, 0.00857411, 0.00829794 DeserializeBlockTest_32MB, 10, 3, 11.8423, 0.380222, 0.405202, 0.399713 Test plan === * `ninja check-all` * `ninja bench_bitcoin` * `./src/bench/bench_bitcoin -filter='Deser.*' --evals=10`

fanquake added the Refactoring label Feb 1, 2018

promag reviewed Feb 6, 2018

View reviewed changes

AkioNak force-pushed the unserialize branch from 94e4cea to e06dfa9 Compare February 7, 2018 05:22

AkioNak mentioned this pull request Feb 21, 2018

p2p-fullblocktest.py fails occasionally #11632

Closed

eklitzke reviewed Mar 20, 2018

View reviewed changes

AkioNak force-pushed the unserialize branch from 763d0b5 to d85530d Compare March 21, 2018 13:58

AkioNak force-pushed the unserialize branch from d85530d to 23afe7a Compare July 7, 2018 15:28

AkioNak force-pushed the unserialize branch from 23afe7a to 3845ebf Compare July 8, 2018 12:16

AkioNak force-pushed the unserialize branch from 3845ebf to f9083e5 Compare July 9, 2018 04:24

practicalswift reviewed Oct 11, 2018

View reviewed changes

src/test/prevector_tests.cpp Outdated Show resolved Hide resolved

AkioNak force-pushed the unserialize branch from b91962e to 5573129 Compare October 15, 2018 07:14

shahzadlone reviewed Feb 1, 2019

View reviewed changes

AkioNak force-pushed the unserialize branch from 5573129 to 86b47fa Compare February 3, 2019 11:16

laanwj merged commit 86b47fa into bitcoin:master Jun 18, 2019

sickpig mentioned this pull request Mar 2, 2020

[port] update prevector.h BitcoinUnlimited/BitcoinUnlimited#2109

Merged

random-zebra mentioned this pull request Jan 19, 2021

[Core] Prevector Optimizations 2 PIVX-Project/PIVX#2148

Merged

furszy mentioned this pull request Mar 21, 2021

[Backport] Serialization updates PIVX-Project/PIVX#2256

Merged

bitcoin locked as resolved and limited conversation to collaborators Dec 16, 2021

speed up Unserialize_impl for prevector #12324

speed up Unserialize_impl for prevector #12324

Uh oh!

Conversation

AkioNak commented Feb 1, 2018

A benchmark result is following:

Uh oh!

fanquake commented Feb 1, 2018

Uh oh!

AkioNak commented Feb 1, 2018

Uh oh!

laanwj commented Feb 1, 2018

Uh oh!

promag left a comment

Choose a reason for hiding this comment

Uh oh!

promag Feb 6, 2018

Choose a reason for hiding this comment

Uh oh!

promag Feb 6, 2018

Choose a reason for hiding this comment

Uh oh!

AkioNak commented Feb 7, 2018

Uh oh!

AkioNak commented Feb 7, 2018

Uh oh!

eklitzke commented Mar 11, 2018

Uh oh!

AkioNak commented Mar 11, 2018

Uh oh!

kallewoof commented Mar 13, 2018

Uh oh!

sipa commented Mar 17, 2018

Uh oh!

AkioNak commented Mar 19, 2018

Uh oh!

eklitzke left a comment

Choose a reason for hiding this comment

Uh oh!

eklitzke Mar 19, 2018

Choose a reason for hiding this comment

Uh oh!

eklitzke Mar 19, 2018

Choose a reason for hiding this comment

Uh oh!

AkioNak commented Mar 20, 2018

Uh oh!

eklitzke commented Mar 21, 2018

Uh oh!

maflcko commented Mar 21, 2018

Uh oh!

AkioNak commented Mar 21, 2018

Uh oh!

sipa commented Mar 21, 2018

Uh oh!

eklitzke commented Mar 22, 2018

Uh oh!

AkioNak commented Jul 7, 2018

Uh oh!

maflcko commented Jul 8, 2018

Uh oh!

AkioNak commented Jul 8, 2018

Uh oh!

AkioNak commented Jul 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AkioNak commented Jul 29, 2018

Uh oh!

kallewoof commented Jul 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

sipa commented Oct 12, 2018

Uh oh!

shahzadlone Feb 1, 2019

Choose a reason for hiding this comment

Uh oh!

AkioNak Feb 3, 2019

Choose a reason for hiding this comment

Uh oh!

AkioNak commented Jul 9, 2018 •

edited

Loading

kallewoof commented Jul 30, 2018 •

edited

Loading

`e2182b0` vs. unserialize (relative)