Skip to content

Conversation

vasild
Copy link
Contributor

@vasild vasild commented Feb 9, 2021

Add a fuzz test that fills addrman with a pile of randomly generated addresses, serializes it to a stream, unserializes the stream to another addrman object and compares the two.

Some discussion of this already happened at jnewbery#18.

@practicalswift
Copy link
Contributor

Concept ACK: very nice fuzzing work @vasild! :)

@practicalswift
Copy link
Contributor

Tested ACK b0f06d0

Thanks for improving our fuzzing harnesses @vasild! :)

@jonatack
Copy link
Member

Concept ACK

@adamjonas
Copy link
Member

Tested b0f06d0 with FUZZ=addrman src/test/fuzz/fuzz qa-assets/fuzz_seed_corpus/addrman

INFO: Seed: 66461418
INFO: Loaded 1 modules   (442033 inline 8-bit counters): 442033 [0x562796653cc0, 0x5627966bfb71),
INFO: Loaded 1 PC tables (442033 PCs): 442033 [0x5627966bfb78,0x562796d7e688),
INFO:     3980 files found in qa-assets/fuzz_seed_corpus/addrman
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 1048576 bytes
INFO: seed corpus: files: 3980 min: 1b max: 1048576b total: 46509329b rss: 109Mb
#512    pulse  cov: 4448 ft: 10018 corp: 209/12622b exec/s: 256 rss: 132Mb
#1024   pulse  cov: 4989 ft: 16253 corp: 342/26Kb exec/s: 128 rss: 154Mb
...

Works as expected.

@DrahtBot
Copy link
Contributor

DrahtBot commented May 24, 2021

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #20233 (addrman: Make consistency checks a runtime option by jnewbery)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@vasild
Copy link
Contributor Author

vasild commented May 28, 2021

b0f06d04c1...0de22a097e: rebase due to conflicts

Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 0de22a0

@practicalswift
Copy link
Contributor

re-ACK 0de22a0

@vasild vasild force-pushed the fuzz_addrman_serunser branch from 0de22a0 to fe9401b Compare July 26, 2021 15:53
@vasild
Copy link
Contributor Author

vasild commented Jul 26, 2021

0de22a097e...fe9401b75d: rebase due to conflicts and address review suggestion

@jnewbery
Copy link
Contributor

Concept ACK

@vasild vasild force-pushed the fuzz_addrman_serunser branch from fe9401b to 2fe1efa Compare July 26, 2021 16:39
@vasild
Copy link
Contributor Author

vasild commented Jul 26, 2021

fe9401b75d...2fe1efa94c: address review suggestion

Move the addrman init code from the test case to a newly added
`CAddrManDeterministic` constructor. This way it can be reused by other
tests.
@vasild vasild force-pushed the fuzz_addrman_serunser branch from 2fe1efa to 1f00d22 Compare August 2, 2021 12:51
@vasild
Copy link
Contributor Author

vasild commented Aug 2, 2021

2fe1efa94c...1f00d22e12: rebase due to conflicts

@vasild vasild force-pushed the fuzz_addrman_serunser branch from 1f00d22 to 6d5e55e Compare August 2, 2021 14:20
@vasild vasild force-pushed the fuzz_addrman_serunser branch from 6d5e55e to 8765179 Compare August 4, 2021 16:23
@vasild
Copy link
Contributor Author

vasild commented Aug 4, 2021

6d5e55e755...87651795d8: fix locking in the test's RandAddr(), optimize addrman comparison (3-4x speedup!) and reduce the size of addrmans to avoid tests running so long as to cause a timeout.

@jonatack
Copy link
Member

jonatack commented Aug 4, 2021

optimize addrman comparison (3-4x speedup!) and reduce the size of addrmans to avoid tests running so long as to cause a timeout.

Nice!

@practicalswift
Copy link
Contributor

cr ACK 8765179

// 0, 1, 2, 3 corresponding to 0%, 100%, 50%, 33%
const size_t n = m_fuzzed_data_provider.ConsumeIntegralInRange<size_t>(0, 3);

const size_t num_sources = m_fuzzed_data_provider.ConsumeIntegralInRange<size_t>(10, 50);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to exclude num_sources == 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.

Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 8765179 rebased to current master, reviewed, fuzz build, ran FUZZ=addrman_serdeser src/test/fuzz/fuzz

@maflcko maflcko merged commit d67330d into bitcoin:master Aug 5, 2021
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Aug 5, 2021
…rMan

8765179 fuzz: check that ser+unser produces the same AddrMan (Vasil Dimov)
6408b24 fuzz: move init code to the CAddrManDeterministic constructor (Vasil Dimov)

Pull request description:

  Add a fuzz test that fills addrman with a pile of randomly generated addresses, serializes it to a stream, unserializes the stream to another addrman object and compares the two.

  Some discussion of this already happened at jnewbery#18.

ACKs for top commit:
  practicalswift:
    cr ACK 8765179
  jonatack:
    ACK 8765179 rebased to current master, reviewed, fuzz build, ran `FUZZ=addrman_serdeser src/test/fuzz/fuzz`

Tree-SHA512: 7eda79279f14f2649840bf752e575d7b02cbaad541f74f7254855ebd4a32da988f042d78aa9228983350283bb74dd0c71f51f04c0846889c3ba2f19f01a0c303
@vasild vasild deleted the fuzz_addrman_serunser branch August 5, 2021 16:47
vasild added a commit to vasild/qa-assets that referenced this pull request Aug 5, 2021
for (size_t j = 0; j < num_addresses; ++j) {
const auto addr = CAddress{CService{RandAddr(), 8333}, NODE_NETWORK};
const auto time_penalty = insecure_rand.randrange(100000001);
#if 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change it to #if 0 to enable the equivalent #else branch which is much shorter and clear, but also slower. The #else snippet also shows what the faster snippet is trying to achieve by "manually" fiddling with addrman internals.

laanwj added a commit that referenced this pull request Sep 20, 2021
57ce203 fuzz: allow lower number of sources (Martin Zumsande)
acf656d fuzz: Use public interface to fill addrman tried tables (Martin Zumsande)
eb2e113 addrman: Improve performance of Good (Martin Zumsande)

Pull request description:

  Currently, `CAddrman::Good()` is rather slow because the process of moving an addr from new to tried involves looping over the new tables twice:
  1) In `Good_()`, there is a loop searching for a new bucket the addr is currently in, but this information is never used except for aborting if it is not found anywhere (since [this commit](e6b343d#diff-49d1faa58beca1ee1509a247e0331bb91f8604e30a483a7b2dea813e6cea02e2R263) it is no longer passed to `MakeTried`)
  This is unnecessary because in a non-corrupted addrman, an address that is not in New must be either in Tried or not at all in addrman, both cases in which we'd return early in `Good_()` and never get to this point.
  I removed this loop (and left a check for `nRefCount` as a belt-and-suspenders check).

  2) In `MakeTried()`, which is called from `Good_()`, another loop removes all instances of this address from new. This can be spedup by stopping the search at  `nRefCount==0`. Further reductions in `nRefCount` would only lead to an assert anyway.
  Moreover, the search can be started at the bucket determined by the source of the addr for which `Good` was called, so that if it is present just once in New, no further buckets need to be checked.

  While calls to `Good()` are not that frequent normally, the performance gain is clearly seen in the fuzz target `addman_serdeser`, where, because of the slowness in creating a decently filled addrman, a shortcut was created that would directly populate the tried tables by reaching into addrman's internals, bypassing `Good()` (#21129).
  I removed this workaround in the second commit: Using `Good()` is still slower by a factor of 2 (down from a factor of ~60 before), but I think that this compensated by the advantages of not having to reach into the internal structures of addrman  (see jnewbery#18 (comment)).

  [Edit]: For benchmark results see #22974 (comment) and #22974 (comment) - the benchmark `AddrManGood` shows a significant speedup by a factor >100.

ACKs for top commit:
  naumenkogs:
    ACK 57ce203
  jnewbery:
    ACK 57ce203
  laanwj:
    Code review ACK 57ce203
  theStack:
    ACK 57ce203
  vasild:
    ACK 57ce203

Tree-SHA512: fb6dfc198f2e28bdbb41cef9709828f22d83b4be0e640a3155ca42e771b6f58466de1468f54d773e794f780a79113f9f7d522032e87fdd75bdc4d99330445198
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Sep 21, 2021
57ce203 fuzz: allow lower number of sources (Martin Zumsande)
acf656d fuzz: Use public interface to fill addrman tried tables (Martin Zumsande)
eb2e113 addrman: Improve performance of Good (Martin Zumsande)

Pull request description:

  Currently, `CAddrman::Good()` is rather slow because the process of moving an addr from new to tried involves looping over the new tables twice:
  1) In `Good_()`, there is a loop searching for a new bucket the addr is currently in, but this information is never used except for aborting if it is not found anywhere (since [this commit](bitcoin@e6b343d#diff-49d1faa58beca1ee1509a247e0331bb91f8604e30a483a7b2dea813e6cea02e2R263) it is no longer passed to `MakeTried`)
  This is unnecessary because in a non-corrupted addrman, an address that is not in New must be either in Tried or not at all in addrman, both cases in which we'd return early in `Good_()` and never get to this point.
  I removed this loop (and left a check for `nRefCount` as a belt-and-suspenders check).

  2) In `MakeTried()`, which is called from `Good_()`, another loop removes all instances of this address from new. This can be spedup by stopping the search at  `nRefCount==0`. Further reductions in `nRefCount` would only lead to an assert anyway.
  Moreover, the search can be started at the bucket determined by the source of the addr for which `Good` was called, so that if it is present just once in New, no further buckets need to be checked.

  While calls to `Good()` are not that frequent normally, the performance gain is clearly seen in the fuzz target `addman_serdeser`, where, because of the slowness in creating a decently filled addrman, a shortcut was created that would directly populate the tried tables by reaching into addrman's internals, bypassing `Good()` (bitcoin#21129).
  I removed this workaround in the second commit: Using `Good()` is still slower by a factor of 2 (down from a factor of ~60 before), but I think that this compensated by the advantages of not having to reach into the internal structures of addrman  (see jnewbery#18 (comment)).

  [Edit]: For benchmark results see bitcoin#22974 (comment) and bitcoin#22974 (comment) - the benchmark `AddrManGood` shows a significant speedup by a factor >100.

ACKs for top commit:
  naumenkogs:
    ACK 57ce203
  jnewbery:
    ACK 57ce203
  laanwj:
    Code review ACK 57ce203
  theStack:
    ACK 57ce203
  vasild:
    ACK 57ce203

Tree-SHA512: fb6dfc198f2e28bdbb41cef9709828f22d83b4be0e640a3155ca42e771b6f58466de1468f54d773e794f780a79113f9f7d522032e87fdd75bdc4d99330445198
@bitcoin bitcoin locked and limited conversation to collaborators Oct 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants