Skip to content

Conversation

AkioNak
Copy link
Contributor

@AkioNak AkioNak commented Feb 1, 2018

The unserializer for prevector uses resize() for reserve the area, but it's prefer to use reserve() because resize() have overhead to call its constructor many times.

However, reserve() does not change the value of _size (a private member of prevector).

This PR make the logic of read from stream to callback function, and prevector handles initilizing new values with that call-back and ajust the value of _size.

The changes are as follows:

  1. prevector.h
    Add a public member function named 'append'.
    This function has 2 params, number of elemenst to append and call-back function that initilizing new appended values.

  2. serialize.h
    In the following two function:

  • Unserialize_impl(Stream& is, prevector<N, T>& v, const unsigned char&)
  • Unserialize_impl(Stream& is, prevector<N, T>& v, const V&)
    Make a callback function from each original logic of reading values from stream, and call prevector's append().
  1. test/prevector_tests.cpp
    Add a test for append().

A benchmark result is following:

[Machine]
MacBook Pro (macOS 10.13.3/i7 2.2GHz/mem 16GB/SSD)

[result]
DeserializeAndCheckBlockTest => 22% faster
DeserializeBlockTest => 29% faster

[before PR]
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 60, 160, 94.4901, 0.0094644, 0.0104715, 0.0098339
DeserializeBlockTest, 60, 130, 65.0964, 0.00800362, 0.00895134, 0.00824187

[After PR]
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 60, 160, 77.1597, 0.00767013, 0.00858959, 0.00805757
DeserializeBlockTest, 60, 130, 49.9443, 0.00613926, 0.00691187, 0.00635527

@fanquake
Copy link
Member

fanquake commented Feb 1, 2018

@AkioNak You might also want to look at #10785.

@AkioNak
Copy link
Contributor Author

AkioNak commented Feb 1, 2018

@fanquake Thank you for pointing to #10785.
I will check if this PR is still useful even if #10785 is merged.

@laanwj
Copy link
Member

laanwj commented Feb 1, 2018

Thanks for adding benchmarks! That's the way to do optimization PRs.

Copy link
Contributor

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash.

@@ -183,6 +183,20 @@ class prevector_tester {
pre_vector = pre_vector_alt;
}

void append(realtype values) {
for(auto v : values) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, space after for.

}
auto p = pre_vector.size();
auto f = [&]() {
for(auto v : values) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, space after for.

@AkioNak
Copy link
Contributor Author

AkioNak commented Feb 7, 2018

@promag Thank you for your review. I fixed them and squashed commits.

@AkioNak
Copy link
Contributor Author

AkioNak commented Feb 7, 2018

@fanquake Fortunately, I think that there was no collision or adverse effect between #10785 and #12324. Also, #12324 still usefull even if #10785 has been merged.

Confirmation summary:

  1. enviroment : MacBook Pro (macOS 10.13.3/i7 2.2GHz/mem 16GB/SSD)
  2. merge - git merge (master d3e4675 + both Serialization improvements #10785 and speed up Unserialize_impl for prevector #12324) : succeed.
  3. build - make clean && make : succeed.
  4. test - test_runner.py : passed. (exclude wallet_encription.py)
  5. benchmark
    [result]
    DeserializeAndCheckBlockTest => 25% faster
    DeserializeBlockTest => 30% faster
[#10785]
 # Benchmark, evals, iterations, total, min, max, median
 DeserializeAndCheckBlockTest, 50, 160, 76.7465, 0.00941822, 0.00986061, 0.00958263
 DeserializeBlockTest, 50, 130, 52.3447, 0.00791727, 0.00828939, 0.00805472

[#10785 + #12324]
 # Benchmark, evals, iterations, total, min, max, median
 DeserializeAndCheckBlockTest, 50, 160, 61.3164, 0.00750302, 0.00797864, 0.00765575
 DeserializeBlockTest, 50, 130, 40.1209, 0.00602097, 0.0063615, 0.00617751

@eklitzke
Copy link
Contributor

Concept ACK. I like the idea of making this faster (I've seen this taking a lot of time in my profiles), but using a template to pass a lambda seems unnecessarily complex. If it gets inlined it will be fast. But if it ends up not being inlined, the lambda will turn into a full std::function object (since it captures) and that will be slow.

Can you either:

  • Remove the func template (it's only called in two places...)
  • Or check that GCC 4.8 inlines the lambda properly?

@AkioNak
Copy link
Contributor Author

AkioNak commented Mar 11, 2018

@eklitzke thank you for your comment. I will try it.

@kallewoof
Copy link
Contributor

I like the idea of making this faster (I've seen this taking a lot of time in my profiles), but using a template to pass a lambda seems unnecessarily complex. If it gets inlined it will be fast. But if it ends up not being inlined, the lambda will turn into a full std::function object (since it captures) and that will be slow.

Adding inline to the template should do the trick, I think.

But I agree that lambdas and callbacks are a bit complex here. I personally think a caveat note on a method in prevector that leaves garbage in the vector is fine, e.g.

/**
 * Grow the size of the prevector by b bytes.
 * NOTE: The added capacity must be overwritten, or it will contain garbage data.
 */

@sipa
Copy link
Member

sipa commented Mar 17, 2018

Agree with @kallewoof. It seems the goal of using a callback here is to avoid having a public method that brings the vector in a (partially) undefined state. However, the result is that now we have a callback that needs to run in this state.

I would either:

  • change the method name to resize_uninitialized or so, and initialize explicitly after it returns
  • pass a begin and end iterator to the callback (avoiding the need for the code in the callback to interact with the prevector while it's in an undefined state).

@AkioNak
Copy link
Contributor Author

AkioNak commented Mar 19, 2018

@eklitzke @kallewoof @sipa Thank you for suggestions.
I introduced resize_uninitialized() and explicitly initialized instead of lambdas and callbacks.

Copy link
Contributor

@eklitzke eklitzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! Just some typos in the comments.

src/prevector.h Outdated
@@ -382,6 +382,20 @@ class prevector {
}
}

inline void resize_uninitialized(size_type new_size) {
// resize_uninitialized change the size of the prevector but dose not initialize.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not: s/dose/does/

src/prevector.h Outdated
@@ -382,6 +382,20 @@ class prevector {
}
}

inline void resize_uninitialized(size_type new_size) {
// resize_uninitialized change the size of the prevector but dose not initialize.
// If size < new_size, the added elements must be initialized explicitly after it return.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/return/returns/

@AkioNak
Copy link
Contributor Author

AkioNak commented Mar 20, 2018

@eklitzke Thank you for your pointing out for my typos. Fixed them.

@eklitzke
Copy link
Contributor

This looks good, although you still need to squash. I'm curious: do you still see the speedup from your intial benchmark? I know we changed other logic in this file since then.

On master:

$ ./src/bench/bench_bitcoin -filter='Deser.*' --evals=10
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 10, 160, 9.86197, 0.00603752, 0.00640901, 0.00616365
DeserializeBlockTest, 10, 130, 6.7487, 0.00486321, 0.0065349, 0.00496556

With your branch:

$ ./src/bench/bench_bitcoin -filter='Deser.*' --evals=10
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 10, 160, 9.96894, 0.00611369, 0.00647763, 0.00622066
DeserializeBlockTest, 10, 130, 6.43537, 0.00484567, 0.00539326, 0.00490343

@maflcko
Copy link
Member

maflcko commented Mar 21, 2018

Could make sense to squash and rebase on master to ease benchmarking?

@AkioNak
Copy link
Contributor Author

AkioNak commented Mar 21, 2018

Squashed and rebased.
Now, speed up is still exist but a little (2.06% - 3.35%).

my enviroment : iMac late 2013 (macOS 10.13.3/i5 2.9GHz/mem 16GB/SSD)
[on master]

# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 1000, 160, 1169.53, 0.00713881, 0.00842435, 0.00718528
DeserializeBlockTest, 1000, 130, 756.064, 0.00574035, 0.006433, 0.00575841

[my PR]

# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 1000, 160, 1131.6, 0.0070475, 0.00744498, 0.00706667
DeserializeBlockTest, 1000, 130, 740.836, 0.00567827, 0.00600557, 0.00569649

@sipa
Copy link
Member

sipa commented Mar 21, 2018

utACK d85530db45b327eecf408bc8e9636fa60e886208

@eklitzke
Copy link
Contributor

Thanks for checking. utACK d85530db45b327eecf408bc8e9636fa60e886208

@AkioNak
Copy link
Contributor Author

AkioNak commented Jul 7, 2018

Rebased and add a bench.
This benchmark measures the part specialized for unserialization.

PrevectorDeserializeNontrivial => 3% faster
PrevectorDeserializeTrivial => 24% faster

my enviroment : iMac late 2013 (macOS 10.13.3/i5 2.9GHz/mem 16GB/SSD)

[on master ] commit 0212187

#1
# Benchmark, evals, iterations, total, min, max, median
PrevectorDeserializeNontrivial, 5, 6800, 10.3091, 0.00030134, 0.000308502, 0.000301828
PrevectorDeserializeTrivial, 5, 52000, 6.21001, 2.3817e-05, 2.39649e-05, 2.387e-05

#2
# Benchmark, evals, iterations, total, min, max, median
PrevectorDeserializeNontrivial, 5, 6800, 10.2418, 0.000300878, 0.000301709, 0.000301096
PrevectorDeserializeTrivial, 5, 52000, 6.2122, 2.38438e-05, 2.39243e-05, 2.39157e-05

#3
# Benchmark, evals, iterations, total, min, max, median
PrevectorDeserializeNontrivial, 5, 6800, 10.2464, 0.000300893, 0.000301853, 0.000301164
PrevectorDeserializeTrivial, 5, 52000, 6.20238, 2.38275e-05, 2.3929e-05, 2.38356e-05

[my PR] commit 23afe7acfa7908905e826f09601c9564ff685be0

#1
# Benchmark, evals, iterations, total, min, max, median
PrevectorDeserializeNontrivial, 5, 6800, 9.95062, 0.000292415, 0.000292987, 0.000292503
PrevectorDeserializeTrivial, 5, 52000, 4.99719, 1.9179e-05, 1.92648e-05, 1.92245e-05

#2
# Benchmark, evals, iterations, total, min, max, median
PrevectorDeserializeNontrivial, 5, 6800, 9.94901, 0.000292216, 0.000292962, 0.00029258
PrevectorDeserializeTrivial, 5, 52000, 4.99091, 1.91576e-05, 1.92298e-05, 1.92036e-05

#3
# Benchmark, evals, iterations, total, min, max, median
PrevectorDeserializeNontrivial, 5, 6800, 9.94274, 0.000292245, 0.00029272, 0.000292385
PrevectorDeserializeTrivial, 5, 52000, 4.99286, 1.91848e-05, 1.92303e-05, 1.92037e-05

@maflcko
Copy link
Member

maflcko commented Jul 8, 2018

Rebased and add a bench.

Could add the bench in a separate commit/pull request to make it easier to check for the speedup.

@AkioNak
Copy link
Contributor Author

AkioNak commented Jul 8, 2018

@MarcoFalke Thank you for your suggestion.
Separated this new benchmark from the original commit.

@AkioNak
Copy link
Contributor Author

AkioNak commented Jul 9, 2018

Re-orderd commits.
First commit (ee9867c) is adding a benchmark fucntion.
Second one (f9083e5) is refactoring(speed up) and tests.

maflcko pushed a commit that referenced this pull request Jul 27, 2018
46340b3 [bench] Add benchmark for unserialize prevector (Akio Nakamura)

Pull request description:

  This PR adds benchmarks for the unserialization of the prevector.

  Note: Separated from #12324.

Tree-SHA512: c055a283328cc2634c01eb60f26604a8665939bbf77d367b6ba6b4e01e77d4511fab69cc3ddb1e62969adb3c48752ed870f45ceba153eee192302601341e18a7
@AkioNak
Copy link
Contributor Author

AkioNak commented Jul 29, 2018

@kallewoof I understand your concern of slow-down caused from randomness.
But I don't worry so much because of deterministic psued random introduce by #10321.
It may be possible to remove randomness from this test (or more widely), but I think it would be better to use a different PR(s).

@kallewoof
Copy link
Contributor

kallewoof commented Jul 30, 2018

Cool about #10321, didn't realize that change was made.

utACK b91962ecf0a9b90c989068e3f12e5699bc90ef6f

[Edited to fix commit reference.]

@sipa
Copy link
Member

sipa commented Oct 12, 2018

utACK b91962ecf0a9b90c989068e3f12e5699bc90ef6f

void resize_uninitialized(realtype values) {
size_t s = real_vector.size() / 2;
real_vector.resize(s);
pre_vector.resize_uninitialized(s);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps here you could make sure before going in loop to reserve values.size() many more memory to optimize the vector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shahzadlone Thank you for your review.
Addressed that you pointed out.

The unserializer for prevector uses resize() for reserve the area,
but it's prefer to use reserve() because resize() have overhead
to call its constructor many times.

However, reserve() does not change the value of "_size"
(a private member of prevector).

This PR introduce resize_uninitialized() to prevector that similar to
resize() but does not call constructor, and added elements are
explicitly initialized in Unserialize_imple().

The changes are as follows:
1. prevector.h
Add a public member function named 'resize_uninitialized'.
This function processes like as resize() but does not call constructors.
So added elemensts needs explicitly initialized after this returns.

2. serialize.h
In the following two function:
 Unserialize_impl(Stream& is, prevector<N, T>& v, const unsigned char&)
 Unserialize_impl(Stream& is, prevector<N, T>& v, const V&)
Calls resize_uninitialized() instead of resize()

3. test/prevector_tests.cpp
Add a test for resize_uninitialized().
@laanwj
Copy link
Member

laanwj commented Jun 18, 2019

utACK 86b47fa

@laanwj laanwj merged commit 86b47fa into bitcoin:master Jun 18, 2019
laanwj added a commit that referenced this pull request Jun 18, 2019
86b47fa speed up Unserialize_impl for prevector (Akio Nakamura)

Pull request description:

  The unserializer for prevector uses `resize()` for reserve the area, but it's prefer to use `reserve()` because `resize()` have overhead to call its constructor many times.

  However, `reserve()` does not change the value of `_size` (a private member of prevector).

  This PR make the logic of read from stream to callback function, and prevector handles initilizing new values with that call-back and ajust the value of `_size`.

  The changes are as follows:
  1. prevector.h
  Add a public member function named 'append'.
  This function has 2 params, number of elemenst to append and call-back function that initilizing new appended values.

  2. serialize.h
  In the following two function:
  - `Unserialize_impl(Stream& is, prevector<N, T>& v, const unsigned char&)`
  - `Unserialize_impl(Stream& is, prevector<N, T>& v, const V&)`
  Make a callback function from each original logic of reading values from stream, and call prevector's `append()`.

  3. test/prevector_tests.cpp
  Add a test for `append()`.

  ## A benchmark result is following:
  [Machine]
  MacBook Pro (macOS 10.13.3/i7 2.2GHz/mem 16GB/SSD)

  [result]
  DeserializeAndCheckBlockTest  => 22% faster
  DeserializeBlockTest => 29% faster

  [before PR]
      # Benchmark, evals, iterations, total, min, max, median
      DeserializeAndCheckBlockTest, 60, 160, 94.4901, 0.0094644, 0.0104715, 0.0098339
      DeserializeBlockTest, 60, 130, 65.0964, 0.00800362, 0.00895134, 0.00824187

  [After PR]
      # Benchmark, evals, iterations, total, min, max, median
      DeserializeAndCheckBlockTest, 60, 160, 77.1597, 0.00767013, 0.00858959, 0.00805757
      DeserializeBlockTest, 60, 130, 49.9443, 0.00613926, 0.00691187, 0.00635527

ACKs for top commit:
  laanwj:
    utACK 86b47fa

Tree-SHA512: 62ea121ccd45a306fefc67485a1b03a853435af762607dae2426a87b15a3033d802c8556e1923727ddd1023a1837d0e5f6720c2c77b38196907e750e15fbb902
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jun 19, 2019
86b47fa speed up Unserialize_impl for prevector (Akio Nakamura)

Pull request description:

  The unserializer for prevector uses `resize()` for reserve the area, but it's prefer to use `reserve()` because `resize()` have overhead to call its constructor many times.

  However, `reserve()` does not change the value of `_size` (a private member of prevector).

  This PR make the logic of read from stream to callback function, and prevector handles initilizing new values with that call-back and ajust the value of `_size`.

  The changes are as follows:
  1. prevector.h
  Add a public member function named 'append'.
  This function has 2 params, number of elemenst to append and call-back function that initilizing new appended values.

  2. serialize.h
  In the following two function:
  - `Unserialize_impl(Stream& is, prevector<N, T>& v, const unsigned char&)`
  - `Unserialize_impl(Stream& is, prevector<N, T>& v, const V&)`
  Make a callback function from each original logic of reading values from stream, and call prevector's `append()`.

  3. test/prevector_tests.cpp
  Add a test for `append()`.

  ## A benchmark result is following:
  [Machine]
  MacBook Pro (macOS 10.13.3/i7 2.2GHz/mem 16GB/SSD)

  [result]
  DeserializeAndCheckBlockTest  => 22% faster
  DeserializeBlockTest => 29% faster

  [before PR]
      # Benchmark, evals, iterations, total, min, max, median
      DeserializeAndCheckBlockTest, 60, 160, 94.4901, 0.0094644, 0.0104715, 0.0098339
      DeserializeBlockTest, 60, 130, 65.0964, 0.00800362, 0.00895134, 0.00824187

  [After PR]
      # Benchmark, evals, iterations, total, min, max, median
      DeserializeAndCheckBlockTest, 60, 160, 77.1597, 0.00767013, 0.00858959, 0.00805757
      DeserializeBlockTest, 60, 130, 49.9443, 0.00613926, 0.00691187, 0.00635527

ACKs for top commit:
  laanwj:
    utACK 86b47fa

Tree-SHA512: 62ea121ccd45a306fefc67485a1b03a853435af762607dae2426a87b15a3033d802c8556e1923727ddd1023a1837d0e5f6720c2c77b38196907e750e15fbb902
@maflcko
Copy link
Member

maflcko commented Jun 19, 2019

Has anyone checked that this actually improves the benchmark as claimed in the OP. It does not for me.

@kallewoof
Copy link
Contributor

I ran the benchmarks on two linux machines (one pretty powerful (GCO) and one not so (Lefty)), and a MacBook Pro. Raw numbers at bottom. I see improvements in master compared to e2182b0 in PrevectorDeserialize*rivial, but not in the other benchmarks:

PrevectorDeserializeNontrivial

(PrevectorDeserializeNontrivial)

PrevectorDeserializeTrivial

(PrevectorDeserializeTrivial)

(Sorry, I wasn't sure how to make graphs for benchmarks. Raw data below.)

git checkout e2182b02b

MacBook Pro

$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 6.38381, 0.00792733, 0.00802109, 0.00799184
DeserializeBlockTest, 5, 130, 4.23839, 0.00634135, 0.00694236, 0.00644537
PrevectorDeserializeNontrivial, 5, 6800, 13.3873, 0.000390055, 0.000397128, 0.000393687
PrevectorDeserializeTrivial, 5, 52000, 7.96211, 2.98404e-05, 3.17966e-05, 3.04728e-05
$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 7.26932, 0.00838346, 0.00972242, 0.00942287
DeserializeBlockTest, 5, 130, 4.31697, 0.00627366, 0.00738568, 0.0063276
PrevectorDeserializeNontrivial, 5, 6800, 13.2609, 0.000386639, 0.000394633, 0.000387913
PrevectorDeserializeTrivial, 5, 52000, 6.47209, 2.45058e-05, 2.51857e-05, 2.49892e-05
$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 6.32675, 0.00777852, 0.00802965, 0.00789037
DeserializeBlockTest, 5, 130, 4.20692, 0.00634361, 0.00674539, 0.00637073
PrevectorDeserializeNontrivial, 5, 6800, 13.3495, 0.000390075, 0.00039594, 0.000391467
PrevectorDeserializeTrivial, 5, 52000, 6.39515, 2.44061e-05, 2.48503e-05, 2.45071e-05
$

Ubuntu Linux (gco)

$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 3.66615, 0.0045421, 0.00466559, 0.00455856
DeserializeBlockTest, 5, 130, 2.46865, 0.00379088, 0.00381237, 0.00379707
PrevectorDeserializeNontrivial, 5, 6800, 1.98757, 5.78348e-05, 6.06301e-05, 5.79352e-05
PrevectorDeserializeTrivial, 5, 52000, 2.46781, 9.44648e-06, 9.56601e-06, 9.48185e-06
$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 3.66454, 0.00455685, 0.00465467, 0.00456468
DeserializeBlockTest, 5, 130, 2.48794, 0.00379826, 0.00392414, 0.00380655
PrevectorDeserializeNontrivial, 5, 6800, 1.99268, 5.78273e-05, 6.03986e-05, 5.83746e-05
PrevectorDeserializeTrivial, 5, 52000, 2.49502, 9.40815e-06, 9.87399e-06, 9.63512e-06
$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 3.68505, 0.00456129, 0.00469898, 0.00458263
DeserializeBlockTest, 5, 130, 2.47728, 0.00378338, 0.00390047, 0.00378849
PrevectorDeserializeNontrivial, 5, 6800, 1.98537, 5.76469e-05, 6.05225e-05, 5.77919e-05
PrevectorDeserializeTrivial, 5, 52000, 2.46844, 9.41589e-06, 9.66261e-06, 9.43065e-06
$


Ubuntu Linux (lefty)

$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 4.026, 0.00500104, 0.00506093, 0.00503687
DeserializeBlockTest, 5, 130, 2.75244, 0.00420178, 0.00429237, 0.00423144
PrevectorDeserializeNontrivial, 5, 6800, 2.1691, 6.35412e-05, 6.42218e-05, 6.3725e-05
PrevectorDeserializeTrivial, 5, 52000, 2.74612, 1.05157e-05, 1.05976e-05, 1.05714e-05
$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 4.06636, 0.0050483, 0.00515482, 0.00507011
DeserializeBlockTest, 5, 130, 2.78506, 0.00421478, 0.00449306, 0.00424098
PrevectorDeserializeNontrivial, 5, 6800, 2.18859, 6.39355e-05, 6.50351e-05, 6.42038e-05
PrevectorDeserializeTrivial, 5, 52000, 2.77944, 1.05288e-05, 1.10395e-05, 1.06421e-05
$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 4.02973, 0.00501044, 0.00507903, 0.00502341
DeserializeBlockTest, 5, 130, 2.77102, 0.00423808, 0.00430167, 0.00425302
PrevectorDeserializeNontrivial, 5, 6800, 2.21241, 6.38159e-05, 6.64756e-05, 6.51687e-05
PrevectorDeserializeTrivial, 5, 52000, 2.76094, 1.05595e-05, 1.07031e-05, 1.06016e-05
$

git checkout master (44d81723236114f9370f386f3b3310477a6dde43)

MacBook Pro

$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 6.17814, 0.0076679, 0.00785118, 0.00768925
DeserializeBlockTest, 5, 130, 4.06245, 0.00622414, 0.00628035, 0.00624402
PrevectorDeserializeNontrivial, 5, 6800, 13.073, 0.000382798, 0.000387096, 0.00038391
PrevectorDeserializeTrivial, 5, 52000, 5.15493, 1.96871e-05, 1.99912e-05, 1.9814e-05
$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 6.1623, 0.00761647, 0.00792896, 0.00765533
DeserializeBlockTest, 5, 130, 4.445, 0.00621718, 0.00774078, 0.00645273
PrevectorDeserializeNontrivial, 5, 6800, 14.5375, 0.000414608, 0.00044193, 0.000425773
PrevectorDeserializeTrivial, 5, 52000, 5.79235, 1.98752e-05, 2.50009e-05, 2.20261e-05
$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 6.22612, 0.00771692, 0.00791803, 0.00773467
DeserializeBlockTest, 5, 130, 4.09782, 0.00627166, 0.00638606, 0.00628699
PrevectorDeserializeNontrivial, 5, 6800, 13.0198, 0.000381592, 0.000384941, 0.000382887
PrevectorDeserializeTrivial, 5, 52000, 5.17657, 1.98519e-05, 2.00462e-05, 1.98927e-05
$

Linux Ubuntu (gco)

$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 3.71635, 0.00461716, 0.00470119, 0.00462609
DeserializeBlockTest, 5, 130, 2.51346, 0.00384798, 0.0039135, 0.00385253
PrevectorDeserializeNontrivial, 5, 6800, 1.64983, 4.74319e-05, 5.09874e-05, 4.79242e-05
PrevectorDeserializeTrivial, 5, 52000, 1.66145, 6.2768e-06, 6.69577e-06, 6.32143e-06
$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 3.6407, 0.00452042, 0.00463718, 0.00453418
DeserializeBlockTest, 5, 130, 2.46214, 0.00373507, 0.00390495, 0.00376771
PrevectorDeserializeNontrivial, 5, 6800, 1.64648, 4.78473e-05, 5.03235e-05, 4.79951e-05
PrevectorDeserializeTrivial, 5, 52000, 1.67024, 6.32431e-06, 6.73776e-06, 6.3539e-06
$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 3.77728, 0.00471439, 0.00474315, 0.00471631
DeserializeBlockTest, 5, 130, 2.49195, 0.00381892, 0.00388194, 0.00382307
PrevectorDeserializeNontrivial, 5, 6800, 1.68265, 4.85277e-05, 5.10689e-05, 4.96072e-05
PrevectorDeserializeTrivial, 5, 52000, 1.68869, 6.41578e-06, 6.60289e-06, 6.50186e-06
$

Ubuntu Linux (lefty)

$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 4.05288, 0.00505502, 0.00507194, 0.00506818
DeserializeBlockTest, 5, 130, 2.80491, 0.0042465, 0.00449445, 0.00427763
PrevectorDeserializeNontrivial, 5, 6800, 1.86039, 5.34824e-05, 5.64816e-05, 5.4309e-05
PrevectorDeserializeTrivial, 5, 52000, 1.88765, 7.17239e-06, 7.30816e-06, 7.28207e-06
$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 3.97765, 0.0049599, 0.00499381, 0.00496968
DeserializeBlockTest, 5, 130, 2.70895, 0.00415038, 0.00417582, 0.00417205
PrevectorDeserializeNontrivial, 5, 6800, 1.80413, 5.28583e-05, 5.33843e-05, 5.30097e-05
PrevectorDeserializeTrivial, 5, 52000, 1.82675, 6.98311e-06, 7.08135e-06, 7.019e-06
$ ./bench_bitcoin -filter=".*Deserialize.*"
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 5, 160, 4.03341, 0.00500616, 0.00510331, 0.0050347
DeserializeBlockTest, 5, 130, 2.74251, 0.00420575, 0.0042337, 0.00421798
PrevectorDeserializeNontrivial, 5, 6800, 1.80552, 5.28383e-05, 5.37108e-05, 5.29898e-05
PrevectorDeserializeTrivial, 5, 52000, 1.8427, 6.96723e-06, 7.47701e-06, 7.0036e-06
$

@AkioNak
Copy link
Contributor Author

AkioNak commented Jun 20, 2019

@MarcoFalke @kallewoof Thank you for the comment and reporting the benchmark result.

The #12549, which is an improvement to prevector and merged on 1 Mar 2018, is very efficient, so the improvement of this PR is relatively hidden in the benchmark of Deserialize*BlockTest.
However, focusing on prevector deserialization itself, there are some speedup with this PR.

So I had have to change the PR description to PrevectorDeserializerivial instead of DeserializeBlockTest to clarify where is improved.

@jamesob
Copy link
Contributor

jamesob commented Jun 24, 2019

Can verify I'm seeing speedups in this branch (as rebased onto master) relative to the commit that came before its merge into master, but only for gcc:

e2182b0 vs. unserialize (relative)

bench name x e2182b0 unserialize
micro.gcc.PrevectorDeserializeNontrivial.total_secs 3 1.143 1.000
micro.gcc.PrevectorDeserializeTrivial.total_secs 3 1.333 1.000
micro.gcc.PrevectorDestructorNontrivial.total_secs 3 1.018 1.000
micro.clang.PrevectorDeserializeNontrivial.total_secs 3 1.000 1.020
micro.clang.PrevectorDeserializeTrivial.total_secs 3 1.106 1.000
micro.clang.PrevectorDestructorNontrivial.total_secs 3 1.000 1.019
micro.clang.PrevectorDestructorTrivial.total_secs 3 1.000 1.003

These changes do not notably affect IBD time though.

codablock pushed a commit to codablock/dash that referenced this pull request Oct 1, 2019
86b47fa speed up Unserialize_impl for prevector (Akio Nakamura)

Pull request description:

  The unserializer for prevector uses `resize()` for reserve the area, but it's prefer to use `reserve()` because `resize()` have overhead to call its constructor many times.

  However, `reserve()` does not change the value of `_size` (a private member of prevector).

  This PR make the logic of read from stream to callback function, and prevector handles initilizing new values with that call-back and ajust the value of `_size`.

  The changes are as follows:
  1. prevector.h
  Add a public member function named 'append'.
  This function has 2 params, number of elemenst to append and call-back function that initilizing new appended values.

  2. serialize.h
  In the following two function:
  - `Unserialize_impl(Stream& is, prevector<N, T>& v, const unsigned char&)`
  - `Unserialize_impl(Stream& is, prevector<N, T>& v, const V&)`
  Make a callback function from each original logic of reading values from stream, and call prevector's `append()`.

  3. test/prevector_tests.cpp
  Add a test for `append()`.

  ## A benchmark result is following:
  [Machine]
  MacBook Pro (macOS 10.13.3/i7 2.2GHz/mem 16GB/SSD)

  [result]
  DeserializeAndCheckBlockTest  => 22% faster
  DeserializeBlockTest => 29% faster

  [before PR]
      # Benchmark, evals, iterations, total, min, max, median
      DeserializeAndCheckBlockTest, 60, 160, 94.4901, 0.0094644, 0.0104715, 0.0098339
      DeserializeBlockTest, 60, 130, 65.0964, 0.00800362, 0.00895134, 0.00824187

  [After PR]
      # Benchmark, evals, iterations, total, min, max, median
      DeserializeAndCheckBlockTest, 60, 160, 77.1597, 0.00767013, 0.00858959, 0.00805757
      DeserializeBlockTest, 60, 130, 49.9443, 0.00613926, 0.00691187, 0.00635527

ACKs for top commit:
  laanwj:
    utACK 86b47fa

Tree-SHA512: 62ea121ccd45a306fefc67485a1b03a853435af762607dae2426a87b15a3033d802c8556e1923727ddd1023a1837d0e5f6720c2c77b38196907e750e15fbb902
codablock pushed a commit to codablock/dash that referenced this pull request Oct 1, 2019
86b47fa speed up Unserialize_impl for prevector (Akio Nakamura)

Pull request description:

  The unserializer for prevector uses `resize()` for reserve the area, but it's prefer to use `reserve()` because `resize()` have overhead to call its constructor many times.

  However, `reserve()` does not change the value of `_size` (a private member of prevector).

  This PR make the logic of read from stream to callback function, and prevector handles initilizing new values with that call-back and ajust the value of `_size`.

  The changes are as follows:
  1. prevector.h
  Add a public member function named 'append'.
  This function has 2 params, number of elemenst to append and call-back function that initilizing new appended values.

  2. serialize.h
  In the following two function:
  - `Unserialize_impl(Stream& is, prevector<N, T>& v, const unsigned char&)`
  - `Unserialize_impl(Stream& is, prevector<N, T>& v, const V&)`
  Make a callback function from each original logic of reading values from stream, and call prevector's `append()`.

  3. test/prevector_tests.cpp
  Add a test for `append()`.

  ## A benchmark result is following:
  [Machine]
  MacBook Pro (macOS 10.13.3/i7 2.2GHz/mem 16GB/SSD)

  [result]
  DeserializeAndCheckBlockTest  => 22% faster
  DeserializeBlockTest => 29% faster

  [before PR]
      # Benchmark, evals, iterations, total, min, max, median
      DeserializeAndCheckBlockTest, 60, 160, 94.4901, 0.0094644, 0.0104715, 0.0098339
      DeserializeBlockTest, 60, 130, 65.0964, 0.00800362, 0.00895134, 0.00824187

  [After PR]
      # Benchmark, evals, iterations, total, min, max, median
      DeserializeAndCheckBlockTest, 60, 160, 77.1597, 0.00767013, 0.00858959, 0.00805757
      DeserializeBlockTest, 60, 130, 49.9443, 0.00613926, 0.00691187, 0.00635527

ACKs for top commit:
  laanwj:
    utACK 86b47fa

Tree-SHA512: 62ea121ccd45a306fefc67485a1b03a853435af762607dae2426a87b15a3033d802c8556e1923727ddd1023a1837d0e5f6720c2c77b38196907e750e15fbb902
codablock pushed a commit to codablock/dash that referenced this pull request Oct 2, 2019
86b47fa speed up Unserialize_impl for prevector (Akio Nakamura)

Pull request description:

  The unserializer for prevector uses `resize()` for reserve the area, but it's prefer to use `reserve()` because `resize()` have overhead to call its constructor many times.

  However, `reserve()` does not change the value of `_size` (a private member of prevector).

  This PR make the logic of read from stream to callback function, and prevector handles initilizing new values with that call-back and ajust the value of `_size`.

  The changes are as follows:
  1. prevector.h
  Add a public member function named 'append'.
  This function has 2 params, number of elemenst to append and call-back function that initilizing new appended values.

  2. serialize.h
  In the following two function:
  - `Unserialize_impl(Stream& is, prevector<N, T>& v, const unsigned char&)`
  - `Unserialize_impl(Stream& is, prevector<N, T>& v, const V&)`
  Make a callback function from each original logic of reading values from stream, and call prevector's `append()`.

  3. test/prevector_tests.cpp
  Add a test for `append()`.

  ## A benchmark result is following:
  [Machine]
  MacBook Pro (macOS 10.13.3/i7 2.2GHz/mem 16GB/SSD)

  [result]
  DeserializeAndCheckBlockTest  => 22% faster
  DeserializeBlockTest => 29% faster

  [before PR]
      # Benchmark, evals, iterations, total, min, max, median
      DeserializeAndCheckBlockTest, 60, 160, 94.4901, 0.0094644, 0.0104715, 0.0098339
      DeserializeBlockTest, 60, 130, 65.0964, 0.00800362, 0.00895134, 0.00824187

  [After PR]
      # Benchmark, evals, iterations, total, min, max, median
      DeserializeAndCheckBlockTest, 60, 160, 77.1597, 0.00767013, 0.00858959, 0.00805757
      DeserializeBlockTest, 60, 130, 49.9443, 0.00613926, 0.00691187, 0.00635527

ACKs for top commit:
  laanwj:
    utACK 86b47fa

Tree-SHA512: 62ea121ccd45a306fefc67485a1b03a853435af762607dae2426a87b15a3033d802c8556e1923727ddd1023a1837d0e5f6720c2c77b38196907e750e15fbb902
barrystyle pushed a commit to PACGlobalOfficial/PAC that referenced this pull request Jan 22, 2020
86b47fa speed up Unserialize_impl for prevector (Akio Nakamura)

Pull request description:

  The unserializer for prevector uses `resize()` for reserve the area, but it's prefer to use `reserve()` because `resize()` have overhead to call its constructor many times.

  However, `reserve()` does not change the value of `_size` (a private member of prevector).

  This PR make the logic of read from stream to callback function, and prevector handles initilizing new values with that call-back and ajust the value of `_size`.

  The changes are as follows:
  1. prevector.h
  Add a public member function named 'append'.
  This function has 2 params, number of elemenst to append and call-back function that initilizing new appended values.

  2. serialize.h
  In the following two function:
  - `Unserialize_impl(Stream& is, prevector<N, T>& v, const unsigned char&)`
  - `Unserialize_impl(Stream& is, prevector<N, T>& v, const V&)`
  Make a callback function from each original logic of reading values from stream, and call prevector's `append()`.

  3. test/prevector_tests.cpp
  Add a test for `append()`.

  ## A benchmark result is following:
  [Machine]
  MacBook Pro (macOS 10.13.3/i7 2.2GHz/mem 16GB/SSD)

  [result]
  DeserializeAndCheckBlockTest  => 22% faster
  DeserializeBlockTest => 29% faster

  [before PR]
      # Benchmark, evals, iterations, total, min, max, median
      DeserializeAndCheckBlockTest, 60, 160, 94.4901, 0.0094644, 0.0104715, 0.0098339
      DeserializeBlockTest, 60, 130, 65.0964, 0.00800362, 0.00895134, 0.00824187

  [After PR]
      # Benchmark, evals, iterations, total, min, max, median
      DeserializeAndCheckBlockTest, 60, 160, 77.1597, 0.00767013, 0.00858959, 0.00805757
      DeserializeBlockTest, 60, 130, 49.9443, 0.00613926, 0.00691187, 0.00635527

ACKs for top commit:
  laanwj:
    utACK 86b47fa

Tree-SHA512: 62ea121ccd45a306fefc67485a1b03a853435af762607dae2426a87b15a3033d802c8556e1923727ddd1023a1837d0e5f6720c2c77b38196907e750e15fbb902
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jul 29, 2020
46340b3 [bench] Add benchmark for unserialize prevector (Akio Nakamura)

Pull request description:

  This PR adds benchmarks for the unserialization of the prevector.

  Note: Separated from bitcoin#12324.

Tree-SHA512: c055a283328cc2634c01eb60f26604a8665939bbf77d367b6ba6b4e01e77d4511fab69cc3ddb1e62969adb3c48752ed870f45ceba153eee192302601341e18a7
jasonbcox pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Nov 12, 2020
Summary:
> The unserializer for prevector uses resize() for reserve the area,
> but it's prefer to use reserve() because resize() have overhead
> to call its constructor many times.
>
> However, reserve() does not change the value of "_size"
> (a private member of prevector).
>
> This PR introduce resize_uninitialized() to prevector that similar to
> resize() but does not call constructor, and added elements are
> explicitly initialized in Unserialize_imple().
>
> The changes are as follows:
> 1. prevector.h
> Add a public member function named 'resize_uninitialized'.
> This function processes like as resize() but does not call constructors.
> So added elemensts needs explicitly initialized after this returns.
>
> 2. serialize.h
> In the following two function:
>  Unserialize_impl(Stream& is, prevector<N, T>& v, const unsigned char&)
>  Unserialize_impl(Stream& is, prevector<N, T>& v, const V&)
> Calls resize_uninitialized() instead of resize()
>
> 3. test/prevector_tests.cpp
> Add a test for resize_uninitialized().

Benchmark details provided in the PR discussion:
> [Machine]
> MacBook Pro (macOS 10.13.3/i7 2.2GHz/mem 16GB/SSD)
>
> [result]
> DeserializeAndCheckBlockTest => 22% faster
> DeserializeBlockTest => 29% faster

This is a backport of Core [[bitcoin/bitcoin#12324 | PR12324]]

Test Plan: `ninja all check-all`

Reviewers: #bitcoin_abc, majcosta

Reviewed By: #bitcoin_abc, majcosta

Differential Revision: https://reviews.bitcoinabc.org/D8367
furszy added a commit to PIVX-Project/PIVX that referenced this pull request Jan 25, 2021
b886a4c Fix header guards using reserved identifiers (Dan Raviv)
b389b3f speed up Unserialize_impl for prevector (Akio Nakamura)
13d2102 Minimal fix to slow prevector tests as stopgap measure (Jeremy Rubin)

Pull request description:

  Backports bitcoin#12324.
  The `DeserializeAndCheckBlock` benchmark (introduced in #2146) shows a speedup of about 4% (not as much as the upstream PR, due to the optimizations already included in #2083).

  Cherry-picks also
  - bitcoin#8671 (with minimal changes to the random context, due to bitcoin#8914 and bitcoin#9792 being already ported out of order).
  - bitcoin#11151

ACKs for top commit:
  Fuzzbawls:
    utACK b886a4c
  furszy:
    utACK b886a4c and merging..

Tree-SHA512: f5de40e5acfb0b875d413d8995d71dd90489730fe4853f0be03d76a1c44ec95eaeb28c0c40d8e91906f23529ad26501bda4f9779ce466cd8603ed97f1662ca98
gades pushed a commit to cosanta/cosanta-core that referenced this pull request Jul 1, 2021
46340b3 [bench] Add benchmark for unserialize prevector (Akio Nakamura)

Pull request description:

  This PR adds benchmarks for the unserialization of the prevector.

  Note: Separated from bitcoin#12324.

Tree-SHA512: c055a283328cc2634c01eb60f26604a8665939bbf77d367b6ba6b4e01e77d4511fab69cc3ddb1e62969adb3c48752ed870f45ceba153eee192302601341e18a7
ftrader pushed a commit to bitcoin-cash-node/bitcoin-cash-node that referenced this pull request Nov 21, 2021
Summary
===

This is a backport of bitcoin/bitcoin#12324

The unserializer for prevector uses `resize()` for reserving the area,
but it's preferred to use `reserve()` because `resize()` has overhead
to call its constructor many times.

However, `reserve()` does not change the value of `_size`
(a private member of prevector).

This PR introduce `resize_uninitialized()` to prevector that similar to
`resize()` but does not call constructor, and added elements are
explicitly initialized in `Unserialize_impl()`.

The changes are as follows:
1. prevector.h
Add a public member function named `resize_uninitialized`. This function
processes like as `resize()` but does not call constructors. So added
elemensts needs explicitly initialized after this returns.

2. serialize.h
In the following two function:
 `Unserialize_impl(Stream &is, prevector<N, T> &v, const uint8_t &)`
 `Unserialize_impl(Stream &is, prevector<N, T> &v, const V &)`
Calls `resize_uninitialized()` instead of `resize()`

3. test/prevector_tests.cpp
Add a test for `resize_uninitialized()`.

Benchmark
===

Before
---

DeserializeAndCheckBlockTest_1MB, 10, 130, 13.9896, 0.0100612, 0.011033, 0.0109143
DeserializeAndCheckBlockTest_32MB, 10, 2, 13.0196, 0.614051, 0.705692, 0.64535
DeserializeBlockTest_1MB, 10, 160, 13.2404, 0.00795686, 0.00876105, 0.00827999
DeserializeBlockTest_32MB, 10, 3, 12.1832, 0.392285, 0.432704, 0.40708

After
---

DeserializeAndCheckBlockTest_1MB, 10, 130, 13.5135, 0.0100882, 0.0110394, 0.0104667
DeserializeAndCheckBlockTest_32MB, 10, 2, 12.47, 0.601044, 0.655857, 0.624506
DeserializeBlockTest_1MB, 10, 160, 13.2542, 0.00799794, 0.00857411, 0.00829794
DeserializeBlockTest_32MB, 10, 3, 11.8423, 0.380222, 0.405202, 0.399713

Test plan
===

* `ninja check-all`
* `ninja bench_bitcoin`
* `./src/bench/bench_bitcoin -filter='Deser.*' --evals=10`
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Dec 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.