Skip to content

Conversation

maflcko
Copy link
Member

@maflcko maflcko commented Aug 21, 2021

peers.dat is silently erased when it can not be parsed or when it appears corrupted. Fix that by notifying the user. This might help in the following examples:

  • The user provided the database, but picked the wrong one.
  • A future version of Bitcoin Core wrote the file and it can't be read.
  • The file was corrupted by a logic bug in Bitcoin Core.
  • The file was corrupted by a disk failure.

@DrahtBot
Copy link
Contributor

DrahtBot commented Aug 21, 2021

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #22937 (refactor: Add fs::PathToString, fs::PathFromString, u8string, u8path functions by ryanofsky)
  • #22831 (test, bugfix: addrman tried table corruption on restart with asmap by jonatack)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@ghost
Copy link

ghost commented Aug 24, 2021

Concept ACK

Copy link
Member Author

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

force pushed to address review comments

Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK fa1704a

Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK faff4fd

@jonatack
Copy link
Member

jonatack commented Sep 3, 2021

Concept ACK.

Like other open pulls to change or fix addrman behavior, it might be good to have functional/regression test coverage in place first.

Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK fa7003f

@maflcko maflcko closed this Sep 7, 2021
@maflcko maflcko reopened this Sep 7, 2021
Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK fa7003f review / debug build clean on each commit, verified the test coverage of #22831 passes on this branch with the following changes

--- a/test/functional/feature_addrman_asmap.py
+++ b/test/functional/feature_addrman_asmap.py
@@ -106,17 +106,21 @@ class AddrmanAsmapTest(BitcoinTestFramework):
         self.log.info('Test bitcoind with missing peers.dat and addrman checks')
         self.stop_node(0)
         os.remove(os.path.join(self.datadir, 'peers.dat'))
-        with self.node.assert_debug_log([f'Missing or invalid file {self.datadir}/peers.dat', 'Recreating peers.dat']):
+        msg = f'Creating peers.dat because the file was not found ("{self.datadir}/peers.dat")'
+        with self.node.assert_debug_log(expected_msgs=[msg]):
             self.start_node(0, ['-checkaddrman=1'])
 
     def test_peers_dat_with_invalid_network_magic(self):
         self.log.info('Test bitcoind with peers.dat having invalid network magic and addrman checks')
         self.stop_node(0)
         shutil.copyfile(self.asmap_raw, os.path.join(self.datadir, 'peers.dat'))
-        with self.node.assert_debug_log(['Invalid network magic number', 'Recreating peers.dat']):
-            self.start_node(0, ['-checkaddrman=1'])
-            self.node.getnodeaddresses()
+        msg = (
+            'Error: Invalid or corrupt peers.dat (Invalid network magic number). If you believe this is a bug, please '
+            'report it to https://github.com/bitcoin/bitcoin/issues. As a workaround, you can move the file '
+            f'("{self.datadir}/peers.dat") out of the way (rename, move, or delete) to have a new one created on the next start.'
+            )
+        self.node.assert_start_raises_init_error(extra_args=['-checkaddrman=1'], expected_msg=msg)

@ghost
Copy link

ghost commented Sep 7, 2021

Tested on Pop!_OS and everything works as expected and mentioned in PR description.

Master Branch:

peers.dat is replaced with a new file if it is corrupt or other issues.

Details

image

image

PR Branch:

A. Corrupt peers.dat file

  1. Open peers.dat add random characters in beginning and save.
  2. Run bitcoind
2021-09-07T18:05:46Z init message: Loading P2P addresses…
2021-09-07T18:05:46Z Error: Invalid or corrupt peers.dat (Invalid network magic number). If you believe this is a bug, please report it to https://github.com/bitcoin/bitcoin/issues. As a workaround, you can move the file ("/home/prayank/.bitcoin/regtest/peers.dat") out of the way (rename, move, or delete) to have a new one created on the next start.
Error: Invalid or corrupt peers.dat (Invalid network magic number). If you believe this is a bug, please report it to https://github.com/bitcoin/bitcoin/issues. As a workaround, you can move the file ("/home/prayank/.bitcoin/regtest/peers.dat") out of the way (rename, move, or delete) to have a new one created on the next start.
2021-09-07T18:05:46Z Shutdown: In progress...

⚠️ Not sure why same error printed twice in terminal running bitcoind. I see only one error message in debug.log:

2021-09-07T18:05:46Z init message: Loading P2P addresses…
2021-09-07T18:05:46Z Error: Invalid or corrupt peers.dat (Invalid network magic number). If you believe this is a bug, please report it to https://github.com/bitcoin/bitcoin/issues. As a workaround, you can move the file ("/home/prayank/.bitcoin/regtest/peers.dat") out of the way (rename, move, or delete) to have a new one created on the next start.
2021-09-07T18:05:46Z Shutdown: In progress...

B. peers.dat from testnet used in regtest

  1. Delete regtest/peers.dat and copy testnet3/peers.dat in regtest/
  2. Run bitcoind
2021-09-07T18:14:33Z init message: Loading P2P addresses…
2021-09-07T18:14:33Z Error: Invalid or corrupt peers.dat (Invalid network magic number). If you believe this is a bug, please report it to https://github.com/bitcoin/bitcoin/issues. As a workaround, you can move the file ("/home/prayank/.bitcoin/regtest/peers.dat") out of the way (rename, move, or delete) to have a new one created on the next start.
Error: Invalid or corrupt peers.dat (Invalid network magic number). If you believe this is a bug, please report it to https://github.com/bitcoin/bitcoin/issues. As a workaround, you can move the file ("/home/prayank/.bitcoin/regtest/peers.dat") out of the way (rename, move, or delete) to have a new one created on the next start.
2021-09-07T18:14:33Z Shutdown: In progress...

Fix the issue by following the things mentioned in error: Delete peers.dat and restart bitcoind. No issues. ✅

maflcko pushed a commit to maflcko/bitcoin-core that referenced this pull request Sep 9, 2021
fade9a1 Remove confusing CAddrDB (MarcoFalke)
fa7f77b Fix addrdb includes (MarcoFalke)
fa3f5d0 Move addrman includes from .h to .cpp (MarcoFalke)

Pull request description:

  Split out from bitcoin#22762 to avoid having to carry it around in (an)other rebase(s)

ACKs for top commit:
  ryanofsky:
    Code review ACK fade9a1
  lsilva01:
    Code Review ACK bitcoin@fade9a1

Tree-SHA512: 7615fb0b6235d0c1e6f8cd6263dd18c4d95890567c2b797fe6fce6cb12cc85ce6eacbe07dbb6d81b05d179ef03b42edfd61c940e35a1044ce6d363b54c2dae5c
MarcoFalke added 2 commits September 9, 2021 09:11
Init should only concern itself with the initialization order, not the
detailed initialization logic of every module.

Also, inlining logic into a method that is ~800 lines of code, makes it
impossible to unit test on its own.
No need to have a function that is only called in one place
@maflcko maflcko force-pushed the 2108-InitErrorAddrmanCorrupt branch from fa7003f to fa59c6d Compare September 9, 2021 07:13
@maflcko maflcko force-pushed the 2108-InitErrorAddrmanCorrupt branch from fa59c6d to fa55c3d Compare September 9, 2021 07:21
@maflcko
Copy link
Member Author

maflcko commented Sep 9, 2021

Rebased (only change is in tests)

@jonatack
Copy link
Member

jonatack commented Sep 9, 2021

Code review re-ACK fa55c3d per git range-diff eb1f570 fa59c6d fa55c3 and verified the new tests fail on master, except "Check mocked addrman is valid", as expected

Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK fa55c3d

@ghost
Copy link

ghost commented Sep 9, 2021

tACK fa55c3d

nit: Will be better if error message was printed only once #22762 (comment)

@maflcko
Copy link
Member Author

maflcko commented Sep 9, 2021

nit: Will be better if error message was printed only once #22762 (comment)

InitErrors are printed to the debug log and stderr. I think it makes sense to print them to both instead of only one.

@maflcko maflcko merged commit 053a5fc into bitcoin:master Sep 10, 2021
@maflcko maflcko deleted the 2108-InitErrorAddrmanCorrupt branch September 10, 2021 09:52
DeserializeFileDB(path_addr, *addrman, CLIENT_VERSION);
LogPrintf("Loaded %i addresses from peers.dat %dms\n", addrman->size(), GetTimeMillis() - nStart);
} catch (const DbNotFoundError&) {
// Addrman can be in an inconsistent state after failure, reset it
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify that, strictly, this isn't needed. DbNotFoundError means that the file wasn't found, so addrman shouldn't be touched at all.

Does anyone have opinions on whether to remove it or not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we need to distinguish somehow this from the other error case, so that we don't signal an error to the caller in this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant recreating the addrman can be skipped, because it wasn't touched. Saves two lines of code:

diff --git a/src/addrdb.cpp b/src/addrdb.cpp
index 1e73750ce3..a86f4c3789 100644
--- a/src/addrdb.cpp
+++ b/src/addrdb.cpp
@@ -192,8 +192,6 @@ std::optional<bilingual_str> LoadAddrman(const std::vector<bool>& asmap, const A
         DeserializeFileDB(path_addr, *addrman, CLIENT_VERSION);
         LogPrintf("Loaded %i addresses from peers.dat  %dms\n", addrman->size(), GetTimeMillis() - nStart);
     } catch (const DbNotFoundError&) {
-        // Addrman can be in an inconsistent state after failure, reset it
-        addrman = std::make_unique<CAddrMan>(asmap, /* deterministic */ false, /* consistency_check_ratio */ check_addrman);
         LogPrintf("Creating peers.dat because the file was not found (%s)\n", path_addr);
         DumpPeerAddresses(args, *addrman);
     } catch (const std::exception& e) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would leave the unique_ptr empty and not signal error to the caller. So the caller will continue, possibly dereferencing the empty unique_ptr, leading to undefined behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pointer is initialized in the beginning of the function. Otherwise DumpPeerAddresses(args, *addrman); wouldn't work either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removal LGTM. Built with the suggested diff and tested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pointer is initialized in the beginning of the function

ah, well, then it is ok. ACK.

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Sep 11, 2021
@sipa
Copy link
Member

sipa commented Oct 25, 2021

I'm not sure it's entirely desirable to require user intervention when downgrading. When there is actual corruption, this may make sense as it's a sign of a deeper problem, but wiping peers.dat when downgrading to an incompatible version seems entirely expected to me.

maflcko pushed a commit to bitcoin-core/gui that referenced this pull request Feb 25, 2022
…ers.dat

d41ed32 p2p: Avoid InitError when downgrading peers.dat (junderw)

Pull request description:

  fixes #24188 (also see bitcoin/bitcoin#22762 (comment))
  When downgrading, a peers.dat with a future version that has a minimum
  required version larger than the downgraded Bitcoin Core version would cause an InitError.

  This commit changes this behavior to overwrite the existing peers.dat with
  a new empty one.

ACKs for top commit:
  prayank23:
    reACK bitcoin/bitcoin@d41ed32
  kallewoof:
    reACK d41ed32

Tree-SHA512: c8e625fe36ce0b1aab6c8ef7241c8954038bb856f2de27bdc4814dc9a60e51be28815c7d77d0f96eace49687a0cea02deb713978bbd3a5add742f50a675f2a40
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Feb 25, 2022
d41ed32 p2p: Avoid InitError when downgrading peers.dat (junderw)

Pull request description:

  fixes bitcoin#24188 (also see bitcoin#22762 (comment))
  When downgrading, a peers.dat with a future version that has a minimum
  required version larger than the downgraded Bitcoin Core version would cause an InitError.

  This commit changes this behavior to overwrite the existing peers.dat with
  a new empty one.

ACKs for top commit:
  prayank23:
    reACK bitcoin@d41ed32
  kallewoof:
    reACK d41ed32

Tree-SHA512: c8e625fe36ce0b1aab6c8ef7241c8954038bb856f2de27bdc4814dc9a60e51be28815c7d77d0f96eace49687a0cea02deb713978bbd3a5add742f50a675f2a40
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Oct 19, 2022
Summary:
```
peers.dat is silently erased when it can not be parsed or when it appears corrupted. Fix that by notifying the user. This might help in the following examples:

 - The user provided the database, but picked the wrong one.
 - A future version of Bitcoin Core wrote the file and it can't be read.
 - The file was corrupted by a logic bug in Bitcoin Core.
 - The file was corrupted by a disk failure.
```

Backport of [[bitcoin/bitcoin#22762 | core#22762]].

Also cherry-picks a commit from PR22697 that is now required for the addrman_tests to pass, and for which the rationale is still trivially correct:
bitcoin/bitcoin@ed9ba8a

Test Plan:
  ninja all check-all

Reviewers: #bitcoin_abc, PiRK

Reviewed By: #bitcoin_abc, PiRK

Differential Revision: https://reviews.bitcoinabc.org/D12308
@bitcoin bitcoin locked and limited conversation to collaborators Oct 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants