Skip to content

Conversation

mzumsande
Copy link
Contributor

@mzumsande mzumsande commented Sep 14, 2021

Currently, CAddrman::Good() is rather slow because the process of moving an addr from new to tried involves looping over the new tables twice:

  1. In Good_(), there is a loop searching for a new bucket the addr is currently in, but this information is never used except for aborting if it is not found anywhere (since this commit it is no longer passed to MakeTried)
    This is unnecessary because in a non-corrupted addrman, an address that is not in New must be either in Tried or not at all in addrman, both cases in which we'd return early in Good_() and never get to this point.
    I removed this loop (and left a check for nRefCount as a belt-and-suspenders check).

  2. In MakeTried(), which is called from Good_(), another loop removes all instances of this address from new. This can be spedup by stopping the search at nRefCount==0. Further reductions in nRefCount would only lead to an assert anyway.
    Moreover, the search can be started at the bucket determined by the source of the addr for which Good was called, so that if it is present just once in New, no further buckets need to be checked.

While calls to Good() are not that frequent normally, the performance gain is clearly seen in the fuzz target addman_serdeser, where, because of the slowness in creating a decently filled addrman, a shortcut was created that would directly populate the tried tables by reaching into addrman's internals, bypassing Good() (#21129).
I removed this workaround in the second commit: Using Good() is still slower by a factor of 2 (down from a factor of ~60 before), but I think that this compensated by the advantages of not having to reach into the internal structures of addrman (see jnewbery#18 (comment)).

[Edit]: For benchmark results see #22974 (comment) and #22974 (comment) - the benchmark AddrManGood shows a significant speedup by a factor >100.

@mzumsande
Copy link
Contributor Author

cc @vasild

@DrahtBot DrahtBot added the P2P label Sep 14, 2021
@jonatack
Copy link
Member

Nice code reduction. Did you see a difference in ./src/bench/bench_bitcoin -filter=AddrManGood?

@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 14, 2021

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #22950 ([p2p] Pimpl AddrMan to abstract implementation details by amitiuttarwar)
  • #22910 ([RFC] Encapsulate asmap in NetGroupManager by jnewbery)
  • #22734 (addrman: Avoid crash on corrupt data, Force Check after deserialize by MarcoFalke)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building to bench. Some minor comments, feel free to ignore until concept acks and need to retouch.

@jonatack
Copy link
Member

Concept ACK. Benches are showing a huge speedup for me locally.

before

$ ./src/bench/bench_bitcoin -filter=AddrManGood

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|      801,906,990.00 |                1.25 |    5.7% |      4.26 | AddrManGood
|      814,159,543.00 |                1.23 |    6.4% |      4.03 | AddrManGood
|      775,065,136.00 |                1.29 |    3.5% |      3.82 | AddrManGood
|      795,855,140.00 |                1.26 |    6.8% |      3.98 | AddrManGood
|      763,767,446.00 |                1.31 |    1.7% |      3.93 | AddrManGood
|      778,016,971.00 |                1.29 |    2.9% |      4.03 | AddrManGood
|      839,720,704.00 |                1.19 |    8.6% |      4.22 | AddrManGood
|      777,639,920.00 |                1.29 |    1.8% |      3.92 | AddrManGood
|      788,152,208.00 |                1.27 |    3.8% |      3.89 | AddrManGood
|      792,925,660.00 |                1.26 |    3.2% |      3.99 | AddrManGood

after

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|        5,182,474.00 |              192.96 |   13.8% |      0.03 | AddrManGood
|        5,525,509.00 |              180.98 |   14.6% |      0.03 | AddrManGood
|        7,215,235.00 |              138.60 |    3.7% |      0.04 | AddrManGood
|        5,219,675.00 |              191.58 |    7.5% |      0.03 | AddrManGood
|        4,163,823.00 |              240.16 |    2.3% |      0.02 | AddrManGood
|        6,694,095.00 |              149.39 |   23.0% |      0.05 | AddrManGood
|        5,966,011.00 |              167.62 |   11.0% |      0.03 | AddrManGood
|        4,861,261.00 |              205.71 |    3.6% |      0.03 | AddrManGood
|        5,029,188.00 |              198.84 |   16.4% |      0.03 | AddrManGood
|        5,440,878.00 |              183.79 |   13.1% |      0.03 | AddrManGood

@mzumsande
Copy link
Contributor Author

Thanks! I didn't know about this benchmark, will try to confirm locally and add the result to OP!

@amitiuttarwar
Copy link
Contributor

amitiuttarwar commented Sep 14, 2021

approach ACK, planning to do a more in-depth review later this week

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK. Maybe also add a commit to allow for shorter fuzz inputs: #21129 (comment)

This will allow the fuzz engine to potentially minimize further on crashes, or produce shorter inputs in general.

@naumenkogs
Copy link
Member

Concept ACK.
Code also looks good, but I'm gonna ACK once you resolve all the pending suggestions from other reviewers. I agree with those suggestions.

Add_(addr, source, time_penalty);

if (n > 0 && mapInfo.size() % n == 0) {
Good_(addr, false, GetTime());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely unrelated, but I was wondering if it makes sense to fuzz the time here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment before this block warns against fuzzing in this loop, some problem with running out of random numbers. Could use insecure_rand though?

mzumsande and others added 3 commits September 16, 2021 00:50
This is done by removing an unnecessary loop in Good_() and looping
through the new tables in MakeTried() more efficiently, choosing a
starting value that allow us to stop early in typical cases.

Co-authored-by: John Newbery <john@johnnewbery.com>
After the performance improvement for Good(), the direct method is only 2x faster
as opposed to 60x before.
@mzumsande
Copy link
Contributor Author

Thanks for the reviews - I addressed all comments.
I added a commit to allow for a lower number of sources in the fuzz test as suggested by @MarcoFalke .

I also ran the benchmark and got similar results to @jonatack:

./src/bench/bench_bitcoin -filter=AddrManGood
before:

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|      840,411,590.00 |                1.19 |    3.3% |      4.23 | `AddrManGood`
|      871,339,127.00 |                1.15 |    0.1% |      4.35 | `AddrManGood`
|      880,275,738.00 |                1.14 |    0.8% |      4.45 | `AddrManGood`
|      874,042,056.00 |                1.14 |    1.7% |      4.34 | `AddrManGood`
|      877,817,450.00 |                1.14 |    2.1% |      4.34 | `AddrManGood`

after

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|        4,889,438.00 |              204.52 |    0.7% |      0.02 | `AddrManGood`
|        4,831,691.00 |              206.97 |    1.5% |      0.02 | `AddrManGood`
|        4,847,879.00 |              206.28 |    1.0% |      0.02 | `AddrManGood`
|        4,745,268.00 |              210.74 |    0.8% |      0.02 | `AddrManGood`
|        4,932,979.00 |              202.72 |    1.3% |      0.02 | `AddrManGood`

@naumenkogs
Copy link
Member

ACK 57ce203

@jnewbery
Copy link
Contributor

ACK 57ce203

Using the public interface to fuzz addrman is a much better approach. As well as being fewer lines of code, the fuzz test is more correct now. The previous code reached into the internals and made changes to private data which invalidated addrman's invariants. For example, entries were placed into the tried table without setting nLastSuccess, which would cause check() to fail:

bitcoin/src/addrman.cpp

Lines 774 to 775 in 87780df

if (!info.nLastSuccess)
return -1;

I think the next step is to move RandAddr() and Fill() out of CAddrManDeterministic so that they don't have access to CAddrMan's private data and methods.

Copy link
Contributor

@theStack theStack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 57ce203

Nice addrman code cleanup and speed-up, taking advantage of nRefCount. Without much surprise, the relative performance gains between PR and master branch, as shown by running the benchmark (./src/bench/bench_bitcoin -filter=AddrManGood), are similar to the results by @jonatack and @mzumsande on my machine (>150x).

master:

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|    1,333,812,989.00 |                0.75 |    3.2% |      6.66 | `AddrManGood`
|    1,289,262,095.00 |                0.78 |    0.9% |      6.47 | `AddrManGood`
|    1,339,632,431.00 |                0.75 |    1.6% |      6.72 | `AddrManGood`
|    1,311,484,130.00 |                0.76 |    0.4% |      6.50 | `AddrManGood`
|    1,326,841,389.00 |                0.75 |    1.6% |      6.59 | `AddrManGood`

PR:

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|        8,645,751.00 |              115.66 |    1.7% |      0.04 | `AddrManGood`
|        8,715,892.00 |              114.73 |    1.2% |      0.04 | `AddrManGood`
|        8,076,594.00 |              123.81 |    1.7% |      0.04 | `AddrManGood`
|        8,715,701.00 |              114.74 |    1.2% |      0.04 | `AddrManGood`
|        8,545,046.00 |              117.03 |    2.2% |      0.04 | `AddrManGood`

(tested on OpenBSD 6.9, amd64)

Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 57ce203

Impressive 165x boost in the AddrManGood benchmark (tested locally)! (2.3 op/s VS 380 op/s). Notice that it is adding entries with addr != source, so that lucky start_bucket is not playing a role in the benchmark, I guess.

In commit acf656d fuzz: Use public interface to fill addrman tried tables, Add_() and Good_() are not public interface, they are private methods.

@@ -85,7 +85,7 @@ class CAddrManDeterministic : public CAddrMan
// 0, 1, 2, 3 corresponding to 0%, 100%, 50%, 33%
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just above, on line 83 (github does not let me comment on it) is a comment:

// Add some of the addresses directly to the "tried" table.

I think the word "directly" does not apply anymore since the hackish direct addition is replaced by Add_() (goes to "new") + Good_() (moved to "tried").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will fix this in a follow-up!

@@ -96,31 +96,12 @@ class CAddrManDeterministic : public CAddrMan
for (size_t j = 0; j < num_addresses; ++j) {
const auto addr = CAddress{CService{RandAddr(), 8333}, NODE_NETWORK};
const auto time_penalty = insecure_rand.randrange(100000001);
#if 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I compared this branch with the #else branch, filled addrman with 9k addresses and did not observe any difference in performance. That is - now the high level Add_() + Good_() perform the same way as the hackish direct add to the tried table 🐎.

@laanwj
Copy link
Member

laanwj commented Sep 20, 2021

This is fantastic, faster and less code.
Code review ACK 57ce203

@laanwj laanwj merged commit 7f7bd31 into bitcoin:master Sep 20, 2021
@mzumsande
Copy link
Contributor Author

Notice that it is adding entries with addr != source, so that lucky start_bucket is not playing a role in the benchmark, I guess.

I think it does: The first Add() call creates a CAddrInfo entry in mapInfo and set CAddrInfo.source to the source, no matter if addr==source or not. Then GetNewBucket(uint256, const std::vector<bool>) which is now called from MakeTried uses just that to determine the starting bucket.

In commit acf656d fuzz: Use public interface to fill addrman tried tables, Add_() and Good_() are not public interface, they are private methods.

That's true - thinking about doing a (fuzz-only) refactoring follow-up, in which this could be switched to Add() and Good().

jnewbery added a commit to jnewbery/bitcoin that referenced this pull request Sep 21, 2021
martinus added a commit to martinus/bitcoin that referenced this pull request Sep 21, 2021
Moves some of the setup into the benchmark loop so it is possible to run
the loop for an arbitrary number of times. Due to recent optimizations
in bitcoin#22974 the benchmark now runs much faster, so the inner loop now calls
Good() 32 times as often to get better numbers.

Renamed the benchmark to AddrManAddThenGood because that's now what is
actually tested. To get the the number of just Good(), one needs to
subtract the benchmark result of AddrManAdd.
@Bossday Bossday mentioned this pull request Sep 21, 2021
Closed
jnewbery added a commit to jnewbery/bitcoin that referenced this pull request Sep 21, 2021
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Sep 21, 2021
57ce203 fuzz: allow lower number of sources (Martin Zumsande)
acf656d fuzz: Use public interface to fill addrman tried tables (Martin Zumsande)
eb2e113 addrman: Improve performance of Good (Martin Zumsande)

Pull request description:

  Currently, `CAddrman::Good()` is rather slow because the process of moving an addr from new to tried involves looping over the new tables twice:
  1) In `Good_()`, there is a loop searching for a new bucket the addr is currently in, but this information is never used except for aborting if it is not found anywhere (since [this commit](bitcoin@e6b343d#diff-49d1faa58beca1ee1509a247e0331bb91f8604e30a483a7b2dea813e6cea02e2R263) it is no longer passed to `MakeTried`)
  This is unnecessary because in a non-corrupted addrman, an address that is not in New must be either in Tried or not at all in addrman, both cases in which we'd return early in `Good_()` and never get to this point.
  I removed this loop (and left a check for `nRefCount` as a belt-and-suspenders check).

  2) In `MakeTried()`, which is called from `Good_()`, another loop removes all instances of this address from new. This can be spedup by stopping the search at  `nRefCount==0`. Further reductions in `nRefCount` would only lead to an assert anyway.
  Moreover, the search can be started at the bucket determined by the source of the addr for which `Good` was called, so that if it is present just once in New, no further buckets need to be checked.

  While calls to `Good()` are not that frequent normally, the performance gain is clearly seen in the fuzz target `addman_serdeser`, where, because of the slowness in creating a decently filled addrman, a shortcut was created that would directly populate the tried tables by reaching into addrman's internals, bypassing `Good()` (bitcoin#21129).
  I removed this workaround in the second commit: Using `Good()` is still slower by a factor of 2 (down from a factor of ~60 before), but I think that this compensated by the advantages of not having to reach into the internal structures of addrman  (see jnewbery#18 (comment)).

  [Edit]: For benchmark results see bitcoin#22974 (comment) and bitcoin#22974 (comment) - the benchmark `AddrManGood` shows a significant speedup by a factor >100.

ACKs for top commit:
  naumenkogs:
    ACK 57ce203
  jnewbery:
    ACK 57ce203
  laanwj:
    Code review ACK 57ce203
  theStack:
    ACK 57ce203
  vasild:
    ACK 57ce203

Tree-SHA512: fb6dfc198f2e28bdbb41cef9709828f22d83b4be0e640a3155ca42e771b6f58466de1468f54d773e794f780a79113f9f7d522032e87fdd75bdc4d99330445198
jnewbery added a commit to jnewbery/bitcoin that referenced this pull request Sep 22, 2021
jnewbery added a commit to jnewbery/bitcoin that referenced this pull request Oct 4, 2021
jnewbery added a commit to jnewbery/bitcoin that referenced this pull request Oct 5, 2021
jnewbery added a commit to jnewbery/bitcoin that referenced this pull request Oct 5, 2021
fanquake added a commit that referenced this pull request Oct 8, 2021
4445211 [fuzz] Update comment in FillAddrman() (John Newbery)
640476e [fuzz] Make Fill() a free function in fuzz/addrman.cpp (John Newbery)
90ad8ad [fuzz] Make RandAddr() a free function in fuzz/addrman.cpp (John Newbery)
491975c [fuzz] Pass FuzzedDataProvider& into Fill() in addrman fuzz tests (John Newbery)
56303e3 [fuzz] Create a FastRandomContext in addrman fuzz tests (John Newbery)

Pull request description:

  #22974 improved the performance of `CAddrMan::Good()` significantly so that it could be used directly in the fuzz tests, instead of those tests reaching directly into addrman's internal data structures.

  This PR continues the work of making the fuzz tests only use `CAddrMan`'s public methods and pulls the `Fill()` and `RandAddr()` methods from `CAddrManDeterministic` into free functions, since they no longer need access to `CAddrMan` internals.

ACKs for top commit:
  theuni:
    utACK 4445211. Agree the failure seems unrelated, looks like some startup race.
  mzumsande:
    ACK 4445211
  vasild:
    ACK 4445211

Tree-SHA512: fcf994e1dedd0012b77f632720b6423d51ceda4eb85c9efe572f2a1150117f9e511114a5206738dd94409137287577f3b01a9998f5237de845410d3d96e7cb7f
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Oct 8, 2021
4445211 [fuzz] Update comment in FillAddrman() (John Newbery)
640476e [fuzz] Make Fill() a free function in fuzz/addrman.cpp (John Newbery)
90ad8ad [fuzz] Make RandAddr() a free function in fuzz/addrman.cpp (John Newbery)
491975c [fuzz] Pass FuzzedDataProvider& into Fill() in addrman fuzz tests (John Newbery)
56303e3 [fuzz] Create a FastRandomContext in addrman fuzz tests (John Newbery)

Pull request description:

  bitcoin#22974 improved the performance of `CAddrMan::Good()` significantly so that it could be used directly in the fuzz tests, instead of those tests reaching directly into addrman's internal data structures.

  This PR continues the work of making the fuzz tests only use `CAddrMan`'s public methods and pulls the `Fill()` and `RandAddr()` methods from `CAddrManDeterministic` into free functions, since they no longer need access to `CAddrMan` internals.

ACKs for top commit:
  theuni:
    utACK 4445211. Agree the failure seems unrelated, looks like some startup race.
  mzumsande:
    ACK 4445211
  vasild:
    ACK 4445211

Tree-SHA512: fcf994e1dedd0012b77f632720b6423d51ceda4eb85c9efe572f2a1150117f9e511114a5206738dd94409137287577f3b01a9998f5237de845410d3d96e7cb7f
rebroad pushed a commit to rebroad/bitcoin that referenced this pull request Oct 13, 2021
Moves some of the setup into the benchmark loop so it is possible to run
the loop for an arbitrary number of times. Due to recent optimizations
in bitcoin#22974 the benchmark now runs much faster, so the inner loop now calls
Good() 32 times as often to get better numbers.

Renamed the benchmark to AddrManAddThenGood because that's now what is
actually tested. To get the the number of just Good(), one needs to
subtract the benchmark result of AddrManAdd.
rebroad pushed a commit to rebroad/bitcoin that referenced this pull request Oct 13, 2021
janus pushed a commit to BitgesellOfficial/bitgesell that referenced this pull request Nov 11, 2021
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Jan 21, 2022
Summary:
> Currently, CAddrman::Good() is rather slow because the process of moving an addr from new to tried involves looping over the new tables twice:
>
>   - In Good_(), there is a loop searching for a new bucket the addr is currently in, but this information is never used except for aborting if it is not found anywhere (since this commit it is no longer passed to MakeTried)
>     This is unnecessary because in a non-corrupted addrman, an address that is not in New must be either in Tried or not at all in addrman, both cases in which we'd return early in Good_() and never get to this point.
>     I removed this loop (and left a check for nRefCount as a belt-and-suspenders check).
>
>   - In MakeTried(), which is called from Good_(), another loop removes all instances of this address from new. This can be spedup by stopping the search at nRefCount==0. Further reductions in nRefCount would only lead to an assert anyway.
>     Moreover, the search can be started at the bucket determined by the source of the addr for which Good was called, so that if it is present just once in New, no further buckets need to be checked.

Improving performance of Good_ is done by removing an unnecessary loop in Good_() and looping
through the new tables in MakeTried() more efficiently, choosing a
starting value that allow us to stop early in typical cases.

Co-authored-by: John Newbery <john@johnnewbery.com>

This is a partial backport of [[bitcoin/bitcoin#22974 | core#22974]]
bitcoin/bitcoin@eb2e113

Depends on D10855

The other two commits are not applicable due to a large stack of missing fuzz backports.

Test Plan:
`ninja all check-all`

Benchmark: `./src/bench/bitcoin-bench -filter=AddrManGood `

Before:
```
|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|       96,564,809.00 |               10.36 |    0.2% |      0.49 | `AddrManGood`

|      100,490,917.00 |                9.95 |    0.8% |      0.51 | `AddrManGood`

|       99,080,029.00 |               10.09 |    0.9% |      0.50 | `AddrManGood`
```

After:
```

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|        1,355,126.00 |              737.94 |    2.1% |      0.01 | `AddrManGood`

|        1,330,379.00 |              751.67 |    5.7% |      0.01 | 〰️ `AddrManGood` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10)

|          859,394.00 |            1,163.61 |    1.3% |      0.00 | `AddrManGood`
```

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Subscribers: Fabien

Differential Revision: https://reviews.bitcoinabc.org/D10857
@bitcoin bitcoin locked and limited conversation to collaborators Oct 30, 2022
@mzumsande mzumsande deleted the 202109_goodspeed branch June 23, 2023 18:09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants