Skip to content

Conversation

jonatack
Copy link
Member

@jonatack jonatack commented Aug 29, 2021

This pull adds a tried argument to RPC addpeeraddress and a regression test for the recent addrman/asmap changes and issue.

PR #22697 introduced a reproducible bug in commit 181a120 that fails addrman consistency checks and causes it to significantly lose peer entries when the -asmap configuration option is used.

The issue occurs upon bitcoind restart due to an initialization order change in src/init.cpp in that commit, whereby CAddrman asmap is set after deserializing peers.dat, rather than before.

Issue reported on the #bitcoin-core-dev IRC channel starting at https://www.erisian.com.au/bitcoin-core-dev/log-2021-08-23.html#l-263.

addrman lost 22813 new and 2 tried addresses due to collisions or invalid addresses
ADDRMAN CONSISTENCY CHECK FAILED!!! err=-17 bitcoind: ./addrman.h:707: void CAddrMan::Check() const: Assertion `false' failed. Aborted

How to reproduce:

  • git checkout 181a1207, build, and launch bitcoind with the -asmap and -checkaddrman=1 configuration options enabled
  • restart bitcoind
  • bitcoind aborts on the second call to the addrman consistency checks in CAddrMan::Check()

How to test this pull:

  • git checkout 181a1207, cherry pick the first commit of this branch, build, git checkout this branch, run test/functional/rpc_net.py, which should pass, and then run test/functional/feature_asmap.py, which should fail with the following output:
AssertionError: Unexpected stderr bitcoind: ./addrman.h:739: void CAddrMan::Check() const: Assertion `false' failed.

@DrahtBot
Copy link
Contributor

DrahtBot commented Aug 30, 2021

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #22087 (Validate port numbers by amadeuszpawlik)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK on adding the test

@jonatack jonatack force-pushed the addrman-asmap-init-order-test branch 2 times, most recently from 2fb6933 to 792c9c2 Compare August 30, 2021 10:36
@jonatack
Copy link
Member Author

Re-pushed with a 20x smaller peers.dat containing 95 new and 3 tried entries (we need a few entries in the tried table for the asmap/addrman regression test) along with some docs per git diff c8a6035 792c9c2.

@maflcko
Copy link
Member

maflcko commented Aug 30, 2021

Maybe still include docs on how to generate the file? (See also #16796 (review))
Can be in the commit message or a separate file.

@jonatack jonatack force-pushed the addrman-asmap-init-order-test branch from 792c9c2 to 24a73b4 Compare August 30, 2021 18:39
@jonatack
Copy link
Member Author

jonatack commented Aug 30, 2021

Maybe still include docs on how to generate the file? (See also #16796 (review))
Can be in the commit message or a separate file.

Good idea. Done in the commit message.

@jonatack jonatack force-pushed the addrman-asmap-init-order-test branch from 24a73b4 to d507068 Compare August 30, 2021 18:50
@amitiuttarwar
Copy link
Contributor

concept ACK, thanks for adding tests. I think squashing cf90c5a & 80dbd9b would make sense, because the functional test seems like the easiest way to verify the contents of the peers.dat file.

@jonatack
Copy link
Member Author

jonatack commented Sep 3, 2021

I think squashing cf90c5a & 80dbd9b would make sense, because the functional test seems like the easiest way to verify the contents of the peers.dat file.

They began as the same commit together, then I realized that it makes more sense for them to be separate:

  • the file addition is a standalone change that can be used for other tests (I have some ideas)
  • so that people can find the data file commit easily, see its documentation, how it was generated, maybe generate a new one (more addresses, or with tor v2 peers, etc.), or perhaps to revert it someday if we have a way to call CAddrMan::MakeTried() from the tests to move addresses into the tried table or to mock it
  • the commit messages are very different

The easiest way to test the peers.dat file is to copy it to a datadir, start a node, and run commands on it like -addrinfo, getnodeaddresses, etc. Edit: added "how to test" info to the commit message.

@jonatack jonatack force-pushed the addrman-asmap-init-order-test branch from d507068 to 48820e6 Compare September 6, 2021 12:37
@jonatack jonatack changed the title p2p, bugfix: fix addrman tried table corruption on restart with asmap test, bugfix: addrman tried table corruption on restart with asmap Sep 6, 2021
@jonatack
Copy link
Member Author

jonatack commented Sep 6, 2021

Rebased, improved the commit messages, added additional test coverage, and renamed the test file to feature_addrman_asmap.py. Updated the PR title and description.

@jonatack
Copy link
Member Author

jonatack commented Sep 8, 2021

Rebased and merged the addrman and asmap tests into one test file.

@jonatack jonatack force-pushed the addrman-asmap-init-order-test branch from 55d1411 to 4de3271 Compare September 14, 2021 13:41
Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review ACK 4de3271 🍛

Show signature and timestamp

Signature:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

review ACK 4de3271b1cb65111ea4b37d8fa9791aa0cd63b81 🍛
-----BEGIN PGP SIGNATURE-----

iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
pUjVpwwAnW+G/Az3yxcnBnDlYpPF+EnaE+9mRU+rBuVH4ta4EXbKmnkGBXcUMFES
vGyOSh214YYpBVMO0ibGV/5V0PwSeijSnSgFS90JgpL6kbGJ9a+Bcp2uc+nH8yAI
6dCGwn40XN8oA0sNHbbqffsaFe4PdDIPsR+zV8Up/2SynpJhwxcN58Nf1KvC+dZD
vliNJO3LnMfVB9Gdah6ObJ2d99iqqWIyg8+aaWxYpUwMp/4oRtkKjZ7CHS9C4TdE
aeVybThyVnD8csbWpx/yKAxjiP9dVZ2yPzXmhTmVZPoWrUH+T3nN8IgYLv0+EkU6
NFXIdz1qZZbJ93TKl9RFIPxgB8LX/NH0nO9sRZji61jsSFKJMSv6HmQfYdAO5/tC
8VUD2pqVnSvbnKYCfpfQEk8lkepDfM0FSH4uE7h3U8VysCbPsDJZpomC845eynai
THR4aeDwVylKCVnHZ7MUWSxAxxvjuh/i6SUYsPEKyxM8r9PcErhmYML2mafVKRdq
86qyGL//
=dWm5
-----END PGP SIGNATURE-----

Timestamp of file with hash 4c359cb1138b60d4da195cd16d1d3fb668bf9409ad9ed02c51ecd1964b7f50f2 -

@jonatack jonatack force-pushed the addrman-asmap-init-order-test branch 2 times, most recently from 5593ea1 to d586817 Compare September 14, 2021 15:06
Copy link
Contributor

@amitiuttarwar amitiuttarwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review ACK d586817

two comment suggestions to consider if you retouch

@DrahtBot DrahtBot mentioned this pull request Sep 15, 2021
@mzumsande
Copy link
Contributor

Tested ACK d586817

I verified that peers.dat, which is now created at runtime, leads to Check() errors in 181a120 with the toy asmap ip_asn.map.

jonatack and others added 2 commits September 15, 2021 15:28
Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
Co-authored-by: John Newbery <john@johnnewbery.com>
Co-authored-by: Vasil Dimov <vd@FreeBSD.org>
Co-authored-by: Amiti Uttarwar <amiti@uttarwar.org>
Co-authored-by: John Newbery <john@johnnewbery.com>
Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
@jonatack jonatack force-pushed the addrman-asmap-init-order-test branch from d586817 to 71bea5d Compare September 15, 2021 14:34
PR bitcoin#22697 introduced a reproducible issue in commit 181a120 that causes the
addrman tried table to fail consistency checks and significantly lose peer
entries when the `-asmap` configuration option is used.

The issue occurs on bitcoind restart due to an initialization order change
in `src/init.cpp` in that commit whereby CAddrman asmap is set after
deserializing `peers.dat`, rather than before.

Issue reported on the `#bitcoin-core-dev` IRC channel starting at
https://www.erisian.com.au/bitcoin-core-dev/log-2021-08-23.html#l-263.

```
addrman lost 22813 new and 2 tried addresses due to collisions or invalid addresses
ADDRMAN CONSISTENCY CHECK FAILED!!! err=-17
bitcoind: ./addrman.h:707: void CAddrMan::Check() const: Assertion `false' failed. Aborted
```

How to reproduce:
- `git checkout 181a120` and recompile
- launch bitcoind with `-asmap` and `-checkaddrman=1` config options
- restart bitcoind
- bitcoind aborts on second call to `CAddrMan::Check()`

This commit adds a regression test to reproduce the case; it passes or fails
with the same error.

Co-authored-by: John Newbery <john@johnnewbery.com>
Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
@jonatack jonatack force-pushed the addrman-asmap-init-order-test branch from 71bea5d to cdaab90 Compare September 15, 2021 14:39
@jonatack
Copy link
Member Author

Very good suggestions (thanks!) Updated per git diff d586817 cdaab90 along with further test improvements.

diff

diff --git a/src/rpc/net.cpp b/src/rpc/net.cpp
index f7b6a2a333..227eec722f 100644
--- a/src/rpc/net.cpp
+++ b/src/rpc/net.cpp
@@ -956,7 +956,7 @@ static RPCHelpMan addpeeraddress()
         if (node.addrman->Add({address}, address)) {
             success = true;
             if (tried) {
-                // Mark entry as good and possibly move it to the "tried" addresses table.
+                // Attempt to move the address to the tried addresses table.
                 node.addrman->Good(address);
             }
         }
diff --git a/test/functional/feature_asmap.py b/test/functional/feature_asmap.py
index 46d55077d7..2dc1e3a7cb 100755
--- a/test/functional/feature_asmap.py
+++ b/test/functional/feature_asmap.py
@@ -14,7 +14,7 @@ Verify node behaviour and debug log when launching bitcoind in these cases:
 
 4. `bitcoind -asmap/-asmap=` with no file specified, using the default asmap
 
-5. `bitcoind -asmap` with an addrman containing new and tried entries
+5. `bitcoind -asmap` restart with an addrman containing new and tried entries
 
 6. `bitcoind -asmap` with no file specified and a missing default asmap file
 
@@ -42,8 +42,8 @@ class AsmapTest(BitcoinTestFramework):
         self.extra_args = [["-checkaddrman=1"]]  # Do addrman checks on all operations.
 
     def fill_addrman(self, node_id):
-        """Add 2 new addresses and 2 tried addresses to the addrman."""
-        for addr, tried in [[0, False], [1, False], [2, True], [3, True]]:
+        """Add 2 tried addresses to the addrman, followed by 2 new addresses."""
+        for addr, tried in [[0, True], [1, True], [2, False], [3, False]]:
             self.nodes[node_id].addpeeraddress(address=f"101.{addr}.0.0", tried=tried, port=8333)
 
     def test_without_asmap_arg(self):
@@ -81,13 +81,18 @@ class AsmapTest(BitcoinTestFramework):
         os.remove(self.default_asmap)
 
     def test_asmap_interaction_with_addrman_containing_entries(self):
-        self.log.info("Test bitcoind -asmap with an addrman containing new and tried entries")
+        self.log.info("Test bitcoind -asmap restart with addrman containing new and tried entries")
         self.stop_node(0)
         shutil.copyfile(self.asmap_raw, self.default_asmap)
         self.start_node(0, ["-asmap", "-checkaddrman=1"])
         self.fill_addrman(node_id=0)
         self.restart_node(0, ["-asmap", "-checkaddrman=1"])
-        with self.node.assert_debug_log(expected_msgs=["Addrman checks started: new 2, tried 2, total 4"]):
+        with self.node.assert_debug_log(
+            expected_msgs=[
+                "Addrman checks started: new 2, tried 2, total 4",
+                "Addrman checks completed successfully",
+            ]
+        ):
             self.node.getnodeaddresses()  # getnodeaddresses re-runs the addrman checks
         os.remove(self.default_asmap)
 
diff --git a/test/functional/rpc_net.py b/test/functional/rpc_net.py
index e2dfa3ea23..4f5d8ce241 100755
--- a/test/functional/rpc_net.py
+++ b/test/functional/rpc_net.py
@@ -239,6 +239,14 @@ class NetTest(BitcoinTestFramework):
         assert_raises_rpc_error(-8, "Network not recognized: Foo", self.nodes[0].getnodeaddresses, 1, "Foo")
 
     def test_addpeeraddress(self):
+        """RPC addpeeraddress sets the source address equal to the destination address.
+        If an address with the same /16 as an existing new entry is passed, it will be
+        placed in the same new bucket and have a 1/64 chance of the bucket positions
+        colliding (depending on the value of nKey in the addrman), in which case the
+        new address won't be added.  The probability of collision can be reduced to
+        1/2^16 = 1/65536 by using an address from a different /16.  We avoid this here
+        by first testing adding a tried table entry before testing adding a new table one.
+        """
         self.log.info("Test addpeeraddress")
         self.restart_node(1, ["-checkaddrman=1"])
         node = self.nodes[1]
@@ -252,19 +260,21 @@ class NetTest(BitcoinTestFramework):
         assert_equal(node.addpeeraddress(address="", port=8333), {"success": False})
         assert_equal(node.getnodeaddresses(count=0), [])
 
-        self.log.debug("Test that adding a valid address succeeds")
-        assert_equal(node.addpeeraddress(address="1.2.3.4", port=8333), {"success": True})
-        addrs = node.getnodeaddresses(count=0)
-        assert_equal(len(addrs), 1)
-        assert_equal(addrs[0]["address"], "1.2.3.4")
-        assert_equal(addrs[0]["port"], 8333)
+        self.log.debug("Test that adding a valid address to the tried table succeeds")
+        assert_equal(node.addpeeraddress(address="1.2.3.4", tried=True, port=8333), {"success": True})
+        with node.assert_debug_log(expected_msgs=["Addrman checks started: new 0, tried 1, total 1"]):
+            addrs = node.getnodeaddresses(count=0)  # getnodeaddresses re-runs the addrman checks
+            assert_equal(len(addrs), 1)
+            assert_equal(addrs[0]["address"], "1.2.3.4")
+            assert_equal(addrs[0]["port"], 8333)
 
-        self.log.debug("Test that adding the same address again when already present fails")
-        assert_equal(node.addpeeraddress(address="1.2.3.4", port=8333), {"success": False})
+        self.log.debug("Test that adding an already-present tried address to the new and tried tables fails")
+        for value in [True, False]:
+            assert_equal(node.addpeeraddress(address="1.2.3.4", tried=value, port=8333), {"success": False})
         assert_equal(len(node.getnodeaddresses(count=0)), 1)
 
-        self.log.debug("Test adding a second address (with a different /16) to the tried table")
-        assert_equal(node.addpeeraddress(address="2.0.0.0", port=8333, tried=True), {"success": True})
+        self.log.debug("Test that adding a second address, this time to the new table, succeeds")
+        assert_equal(node.addpeeraddress(address="2.0.0.0", port=8333), {"success": True})
         with node.assert_debug_log(expected_msgs=["Addrman checks started: new 1, tried 1, total 2"]):
             addrs = node.getnodeaddresses(count=0)  # getnodeaddresses re-runs the addrman checks

@jnewbery
Copy link
Contributor

utACK cdaab90

This isn't a blocker, but I've never liked relying too heavily on the assert_debug_log() pattern in the functional tests. It's essentially asserting that a specific line of code is hit, which couples the functional tests very closely with the implementation. Ideally, the functional tests would test the functionality, and the implementation could be changed without having to update the tests.

In this case, we could extend the getnodeaddresses RPC to return whether the address is in the tried or new table, and (if it's in the new table) its refcount. That'd allow us to more closely test the contents of the addrman using a public interface.

Like I said, not a blocker for this PR, but perhaps something to consider as a potential follow-up.

@jonatack
Copy link
Member Author

jonatack commented Sep 15, 2021

@jnewbery Agree--that sounds like a good idea for a possible follow-up. Edit: proposed in #23035.

Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK cdaab90

The new test Test bitcoind -asmap restart with addrman containing new and tried entries fails before the fix (50fd77045e~) and passes with the fix (50fd77045e).

Comment on lines +242 to +249
"""RPC addpeeraddress sets the source address equal to the destination address.
If an address with the same /16 as an existing new entry is passed, it will be
placed in the same new bucket and have a 1/64 chance of the bucket positions
colliding (depending on the value of nKey in the addrman), in which case the
new address won't be added. The probability of collision can be reduced to
1/2^16 = 1/65536 by using an address from a different /16. We avoid this here
by first testing adding a tried table entry before testing adding a new table one.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this nondeterminism turns out to be a problem also in other cases, then I guess a new option bitcoind -addrmandeterministic could be introduced which creates a deterministic addrman.

Comment on lines +278 to +280
with node.assert_debug_log(expected_msgs=["Addrman checks started: new 1, tried 1, total 2"]):
addrs = node.getnodeaddresses(count=0) # getnodeaddresses re-runs the addrman checks
assert_equal(len(addrs), 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't check the contents of addrs[] here, but checked it above. If the reason for this is that the order is undeterministic, then I guess it can be sorted and then checked that it contains 1.2.3.4 and 2.0.0.0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #23035.

@jamesob
Copy link
Contributor

jamesob commented Sep 20, 2021

This seems ready for merge.

@mzumsande
Copy link
Contributor

re-ACK cdaab90 (based on code review of diff to d586817)

@jamesob
Copy link
Contributor

jamesob commented Sep 20, 2021

Therefore, #22791 fixed not just the proximal cause of the bug (initialization order), but also the ultimate cause (bad encapsulation and partially-initalized objects) by making that member const and private. Just changing the initialization order would be a partial fix, rather than eliminating a whole class of bugs.

I don't want to belabor drama here, but there is a categorical difference between (i) extant bugs and (ii) footgun conditions that might lead to a bug. One case requires a clear and immediate fix (and if feasible an accompanying test) and the other can be responsibly treated in a number of ways with varying rapidity.

I'd encourage everyone to take a deep breath and remember that we're all on the same team, and everyone is here to improve the project.

While I appreciate the sentiment here, this is a controversial, security-critical project that is likely to someday experience subversion, overt or otherwise. Consequently we should be eager to recognize and address process failures, and we should certainly not be gunshy about adding test coverage, whether or not its implementation is ideal.

@maflcko
Copy link
Member

maflcko commented Sep 21, 2021

Thanks for adjusting this to use addpeeraddress (and hashing out the non-determinism issues that came with it).

Concept ACK (didn't re-review)

@maflcko maflcko merged commit 223ad2f into bitcoin:master Sep 21, 2021
@jonatack jonatack deleted the addrman-asmap-init-order-test branch September 21, 2021 07:39
@jnewbery
Copy link
Contributor

@jamesob - I'm not going to continue the discussion of process in this PR, since I think we're basically going in circles at this point. However, I do want to respond to your final point:

we should certainly not be gunshy about adding test coverage, whether or not its implementation is ideal.

I totally disagree with this. Our standards for test code should be as high as for code in the product itself, since all code (test or production) carries an ongoing maintenance cost. If a proposed test introduces difficult-to-maintain binary files or intermittently fails, then those issues absolutely should be brought up in review and resolved before merge. I agree with @fanquake's assessment here:

I think this perspective is important because it can determine how we fix bugs. In this case, the bug is already fixed, and this PR is adding regression testing (which is important!). However given the impact and scope are low, there isn't any sort of rush to get this regression testing added that we can't do it "right" the first time.

There's no rush here. A merge-first-fix-later attitude increases the overall cost of this code and transfers that cost onto developers who'll have to maintain/modify the tests later.

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Sep 21, 2021
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Oct 20, 2022
… asmap

Summary:
```
This pull adds a tried argument to RPC addpeeraddress and a regression test for the recent addrman/asmap changes and issue.
PR #22697 introduced a reproducible bug in commit 181a120 that fails addrman consistency checks and causes it to significantly lose peer entries when the -asmap configuration option is used.
The issue occurs upon bitcoind restart due to an initialization order change in src/init.cpp in that commit, whereby CAddrman asmap is set after deserializing peers.dat, rather than before.
```

Backport of [[bitcoin/bitcoin#22831 | core#22831]].

Note that we never suffered that bug, but still it's worth checking we don't introduce it at a later point in time.

Depends on D12312.

Test Plan:
  ninja all check-all

Reviewers: #bitcoin_abc, sdulfari

Reviewed By: #bitcoin_abc, sdulfari

Differential Revision: https://reviews.bitcoinabc.org/D12313
@bitcoin bitcoin locked and limited conversation to collaborators Oct 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.