-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Fix de-serialization bug where AddrMan is left corrupted #7696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
BOOST_FIXTURE_TEST_SUITE(addrman_tests, BasicTestingSetup) | ||
|
||
BOOST_AUTO_TEST_CASE(caddrdb_deserialiation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strange test name ;-) z
missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I changed it to caddrdb_read, since that is a better descriptor of what I am testing.
Travis fails with:
But only in one configuration. |
Concept ACK, more tests are better. Haven't verified the tests though. |
@paveljanik The non-determinism of the test passing is worrying especially because the test that was failing was not testing new behavior. I haven't been able to replicate this failure locally, any idea what is happening with it? |
Can someone add the labels bug and data corruption. |
Concept ACK. Instead of relying on the output of CAddrDB::Read to be "clean" in case of an error I'd prefer a different approach, though: ignore and reset its output when the function fails. In my experience this is a more robust approach - it's easy to forget resetting it in some code path. In a lot of APIs the output of failing functions is thus undefined. |
@laanwj I'm not sure I follow, do you want the "clearing" to happen closer to the source of the error? In the serialization code? |
@EthanHeilman At the call site:
It's hard to guarantee that The cleanest way if we had a clean slate, I think, would be to have Read return an |
@laanwj Good idea, I amended the commit to include addrman.Clear in StartNode's Read() error condition. |
} | ||
|
||
bool CAddrDB::Read(CAddrMan& addr, CDataStream& ssPeers, uint256& hashIn) | ||
{ | ||
// verify stored checksum matches input data | ||
uint256 hashTmp = Hash(ssPeers.begin(), ssPeers.end()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could make sense to move the hash check to the outer read function - after all, it's not part of the inner stream but of the wrapping file.
Would simplify testing as there'd be no need to pass hashIn here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking with passing hashIn as a parameter was that it would allow me to write a test on the "inner read" to verify an error is thrown if hashIn is not the same as the hash of ssPeers. Something like:
BOOST_AUTO_TEST_CASE(caddrdb_hash)
{
CAddrManUncorrupted addrmanUncorrupted;
CService addr1 = CService("250.7.1.1", 8333);
addrmanUncorrupted.Add(CAddress(addr1), CService("252.5.1.1", 8333));
// Test that the de-serialization does not throw an exception.
CDataStream ssPeers1 = AddrmanToStream(addrmanUncorrupted);
addrmanUncorrupted.Add(CAddress(addr3), CService("253.3.3.33", 8333));
CDataStream ssPeersDiff = AddrmanToStream(addrmanUncorrupted);
uint256 badHashIn = Hash(ssPeersDiff.begin(), ssPeersDiff.end());
CAddrDB adb;
CAddrMan addrman;
adb.Read(addrman, ssPeers2, badHashIn); // Assert this throws an exception since the hashes don't match
}
but I decided to wait until I had more time to work on it.
How interested are you in a system for mocking out filesystem objects rather than having to do everything at the stream level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, yes, that makes sense.
How interested are you in a system for mocking out filesystem objects rather than having to do everything at the stream level?
Depends on the (non-test) code impact. I'm not entirely sure that is necessary - e.g. you could create a temporary file and make it read that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on the (non-test) code impact.
It is on my list of things to investigate, but I was considering creating a class which wraps File/fopen and can be mocked out in unitests similar to RandomInt in addrman.h.
https://github.com/bitcoin/bitcoin/blob/master/src/addrman.h#L239
I share your concern as this would require a big diff when all the instances of File are replaced with mockable File. I assume this is a problem others have solved so I'm looking into alternates.
I'm not entirely sure that is necessary - e.g. you could create a temporary file and make it read that.
YMMV, but in my experience tmp files in unittests cause hard to debug test failures due to filesystem weirdness and added state (file deletes sometimes fail).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YMMV, but in my experience tmp files in unittests cause hard to debug test failures due to filesystem weirdness and added state (file deletes sometimes fail).
I agree. It's somewhat preferable to not clutter files around the place.
On the other hand it actually tests the filesystem interaction then :)
I share your concern as this would require a big diff when all the instances of File are replaced with mockable File.
C++ has some support for this by replacing inputs/outputs streams by string streams, unfortunately - or not, I'm not entirely up to date of the pros and cons of it - we don't really use C++'s file API.
We mainly use fopen
and FILE*
. Note that on some platforms (including Linux and MacOSX IIRC) for FILE* there is: fmemopen, open_memstream. May be interesting.
Then one could change the APIs to take a FILE* instead of filename. On platforms without the function it could fall back to a temporary file, on platforms with the function it can use a memory stream.
Does this pull-request need any additional changes? |
I still don't really like passing the hash into bool CAddrDB::Read, I think it makes for an awful API :) |
* CAddrDB modified so that when de-serialization code throws an exception Addrman is reset to a clean state * CAddrDB modified to make unit tests possible * Regression test created to ensure bug is fixed * StartNode modifed to clear adrman if CAddrDB::Read returns an error code.
@laanwj I removed hash from CAddrDB::Read |
utACK 1475ecf |
@EthanHeilman Looks good to me now, thanks. utACK |
} catch (const std::exception& e) { | ||
exceptionThrown = true; | ||
} | ||
// Even through de-serialization failed adddrman is not left in a clean state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adddrman
1475ecf Fix de-serialization bug where AddrMan is corrupted after exception * CAddrDB modified so that when de-serialization code throws an exception Addrman is reset to a clean state * CAddrDB modified to make unit tests possible * Regression test created to ensure bug is fixed * StartNode modifed to clear adrman if CAddrDB::Read returns an error code. (EthanHeilman)
1475ecf Fix de-serialization bug where AddrMan is corrupted after exception (EthanHeilman) * CAddrDB modified so that when de-serialization code throws an exception Addrman is reset to a clean state * CAddrDB modified to make unit tests possible * Regression test created to ensure bug is fixed * StartNode modifed to clear adrman if CAddrDB::Read returns an error code.
1475ecf Fix de-serialization bug where AddrMan is corrupted after exception (EthanHeilman) * CAddrDB modified so that when de-serialization code throws an exception Addrman is reset to a clean state * CAddrDB modified to make unit tests possible * Regression test created to ensure bug is fixed * StartNode modifed to clear adrman if CAddrDB::Read returns an error code.
banlist.dat: store banlist on disk Cherry-picked from the following upstream PRs: - bitcoin/bitcoin#6310 - bitcoin/bitcoin#6315 - Only the new signal and net changes, no QT. - bitcoin/bitcoin#7350 - bitcoin/bitcoin#7458 - bitcoin/bitcoin#7696 - bitcoin/bitcoin#7959 - bitcoin/bitcoin#7906 - Only the ban-related commits. - bitcoin/bitcoin#8392 - Only the second commit. - bitcoin/bitcoin#8085 - Only the second commit. - bitcoin/bitcoin#10564 Part of #2074.
banlist.dat: store banlist on disk Cherry-picked from the following upstream PRs: - bitcoin/bitcoin#6310 - bitcoin/bitcoin#6315 - Only the new signal and net changes, no QT. - bitcoin/bitcoin#7350 - bitcoin/bitcoin#7458 - bitcoin/bitcoin#7696 - bitcoin/bitcoin#7959 - bitcoin/bitcoin#7906 - Only the ban-related commits. - bitcoin/bitcoin#8392 - Only the second commit. - bitcoin/bitcoin#8085 - Only the second commit. - bitcoin/bitcoin#10564 Part of #2074.
banlist.dat: store banlist on disk Cherry-picked from the following upstream PRs: - bitcoin/bitcoin#6310 - bitcoin/bitcoin#6315 - Only the new signal and net changes, no QT. - bitcoin/bitcoin#7350 - bitcoin/bitcoin#7458 - bitcoin/bitcoin#7696 - bitcoin/bitcoin#7959 - bitcoin/bitcoin#7906 - Only the ban-related commits. - bitcoin/bitcoin#8392 - Only the second commit. - bitcoin/bitcoin#8085 - Only the second commit. - bitcoin/bitcoin#10564 - bitcoin/bitcoin#13061 - Wasn't needed when I first made this backport in 2017, but it is now! Part of #2074.
banlist.dat: store banlist on disk Cherry-picked from the following upstream PRs: - bitcoin/bitcoin#6310 - bitcoin/bitcoin#6315 - Only the new signal and net changes, no QT. - bitcoin/bitcoin#7350 - bitcoin/bitcoin#7458 - bitcoin/bitcoin#7696 - bitcoin/bitcoin#7959 - bitcoin/bitcoin#7906 - Only the ban-related commits. - bitcoin/bitcoin#8392 - Only the second commit. - bitcoin/bitcoin#8085 - Only the second commit. - bitcoin/bitcoin#10564 - bitcoin/bitcoin#13061 - Wasn't needed when I first made this backport in 2017, but it is now! Part of #2074.
CAddrDB::Read is used to manage the loading of AddrMan from peers.dat. As shown in the code below, when CAddrDB::Read catches an exception from the de-serialization code it returns addrman "as-is", despite the fact that it failed to load correctly.
https://github.com/bitcoin/bitcoin/blob/master/src/net.cpp#L2330
This use of a corrupted addrman can cause a bitcoind client to get in a state such that when bitcoind starts it will not run correctly. Once bitcoind gets into such a state the only fix is for the user to manually delete the offending peers.dat file.
This pull request fixes this behavior so that an exception during the de-serialization process will leave addrman in a clean state. Unittests verify this new behavior.