Skip to content

Conversation

EthanHeilman
Copy link
Contributor

CAddrDB::Read is used to manage the loading of AddrMan from peers.dat. As shown in the code below, when CAddrDB::Read catches an exception from the de-serialization code it returns addrman "as-is", despite the fact that it failed to load correctly.

    try {
        // de-serialize file header (network specific magic number) and ..
        ssPeers >> FLATDATA(pchMsgTmp);

        // ... verify the network matches ours
        if (memcmp(pchMsgTmp, Params().MessageStart(), sizeof(pchMsgTmp)))
            return error("%s: Invalid network magic number", __func__);

        // de-serialize address data into one CAddrMan object
        ssPeers >> addr;
    }
    catch (const std::exception& e) {
        return error("%s: Deserialize or I/O error - %s", __func__, e.what());
    }

https://github.com/bitcoin/bitcoin/blob/master/src/net.cpp#L2330

This use of a corrupted addrman can cause a bitcoind client to get in a state such that when bitcoind starts it will not run correctly. Once bitcoind gets into such a state the only fix is for the user to manually delete the offending peers.dat file.

This pull request fixes this behavior so that an exception during the de-serialization process will leave addrman in a clean state. Unittests verify this new behavior.


BOOST_FIXTURE_TEST_SUITE(addrman_tests, BasicTestingSetup)

BOOST_AUTO_TEST_CASE(caddrdb_deserialiation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strange test name ;-) z missing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I changed it to caddrdb_read, since that is a better descriptor of what I am testing.

@paveljanik
Copy link
Contributor

Travis fails with:

test/net_tests.cpp(93): error: in "addrman_tests/caddrdb_deserialiation": check addrman1.size() == 3 has failed
test/net_tests.cpp(105): error: in "addrman_tests/caddrdb_deserialiation": check addrman2.size() == 3 has failed

But only in one configuration.

@jonasschnelli
Copy link
Contributor

Concept ACK, more tests are better. Haven't verified the tests though.

@EthanHeilman
Copy link
Contributor Author

@paveljanik The non-determinism of the test passing is worrying especially because the test that was failing was not testing new behavior. I haven't been able to replicate this failure locally, any idea what is happening with it?

@EthanHeilman
Copy link
Contributor Author

Can someone add the labels bug and data corruption.

@laanwj
Copy link
Member

laanwj commented Mar 24, 2016

Concept ACK.

Instead of relying on the output of CAddrDB::Read to be "clean" in case of an error I'd prefer a different approach, though: ignore and reset its output when the function fails. In my experience this is a more robust approach - it's easy to forget resetting it in some code path. In a lot of APIs the output of failing functions is thus undefined.

@EthanHeilman
Copy link
Contributor Author

@laanwj I'm not sure I follow, do you want the "clearing" to happen closer to the source of the error? In the serialization code?

@laanwj
Copy link
Member

laanwj commented Mar 25, 2016

@EthanHeilman At the call site:

        if (adb.Read(addrman))
            LogPrintf("Loaded %i addresses from peers.dat  %dms\n", addrman.size(), GetTimeMillis() - nStart);
        else {
            addrman.clear(); // Addrman can be in an inconsistent state after failure, reset it
            LogPrintf("Invalid or missing peers.dat; recreating\n");
            DumpAddresses();
        }

It's hard to guarantee that adb.Read won't leave the passed addrman in an inconsistent state after failure in all possible internal code paths. This would at least catch all possible cases.

The cleanest way if we had a clean slate, I think, would be to have Read return an AddrMan* which is null if an error happened, then instantiate a new one on faiilure. But as we have that global object that's too impactful.

@EthanHeilman
Copy link
Contributor Author

@laanwj Good idea, I amended the commit to include addrman.Clear in StartNode's Read() error condition.

}

bool CAddrDB::Read(CAddrMan& addr, CDataStream& ssPeers, uint256& hashIn)
{
// verify stored checksum matches input data
uint256 hashTmp = Hash(ssPeers.begin(), ssPeers.end());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could make sense to move the hash check to the outer read function - after all, it's not part of the inner stream but of the wrapping file.
Would simplify testing as there'd be no need to pass hashIn here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking with passing hashIn as a parameter was that it would allow me to write a test on the "inner read" to verify an error is thrown if hashIn is not the same as the hash of ssPeers. Something like:

BOOST_AUTO_TEST_CASE(caddrdb_hash)
{
    CAddrManUncorrupted addrmanUncorrupted;
    CService addr1 = CService("250.7.1.1", 8333);
    addrmanUncorrupted.Add(CAddress(addr1), CService("252.5.1.1", 8333));

    // Test that the de-serialization does not throw an exception.
    CDataStream ssPeers1 = AddrmanToStream(addrmanUncorrupted);

    addrmanUncorrupted.Add(CAddress(addr3), CService("253.3.3.33", 8333));
    CDataStream ssPeersDiff = AddrmanToStream(addrmanUncorrupted);
    uint256 badHashIn = Hash(ssPeersDiff.begin(), ssPeersDiff.end());

    CAddrDB adb;
    CAddrMan addrman;
    adb.Read(addrman, ssPeers2, badHashIn); // Assert this throws an exception since the hashes don't match
}

but I decided to wait until I had more time to work on it.

How interested are you in a system for mocking out filesystem objects rather than having to do everything at the stream level?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, yes, that makes sense.

How interested are you in a system for mocking out filesystem objects rather than having to do everything at the stream level?

Depends on the (non-test) code impact. I'm not entirely sure that is necessary - e.g. you could create a temporary file and make it read that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on the (non-test) code impact.

It is on my list of things to investigate, but I was considering creating a class which wraps File/fopen and can be mocked out in unitests similar to RandomInt in addrman.h.

https://github.com/bitcoin/bitcoin/blob/master/src/addrman.h#L239

I share your concern as this would require a big diff when all the instances of File are replaced with mockable File. I assume this is a problem others have solved so I'm looking into alternates.

I'm not entirely sure that is necessary - e.g. you could create a temporary file and make it read that.

YMMV, but in my experience tmp files in unittests cause hard to debug test failures due to filesystem weirdness and added state (file deletes sometimes fail).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YMMV, but in my experience tmp files in unittests cause hard to debug test failures due to filesystem weirdness and added state (file deletes sometimes fail).

I agree. It's somewhat preferable to not clutter files around the place.
On the other hand it actually tests the filesystem interaction then :)

I share your concern as this would require a big diff when all the instances of File are replaced with mockable File.

C++ has some support for this by replacing inputs/outputs streams by string streams, unfortunately - or not, I'm not entirely up to date of the pros and cons of it - we don't really use C++'s file API.

We mainly use fopen and FILE*. Note that on some platforms (including Linux and MacOSX IIRC) for FILE* there is: fmemopen, open_memstream. May be interesting.

Then one could change the APIs to take a FILE* instead of filename. On platforms without the function it could fall back to a temporary file, on platforms with the function it can use a memory stream.

@EthanHeilman
Copy link
Contributor Author

Does this pull-request need any additional changes?

@laanwj
Copy link
Member

laanwj commented Apr 19, 2016

I still don't really like passing the hash into bool CAddrDB::Read, I think it makes for an awful API :)

* CAddrDB modified so that when de-serialization code throws an exception Addrman is reset to a clean state
* CAddrDB modified to make unit tests possible
* Regression test created to ensure bug is fixed
* StartNode modifed to clear adrman if CAddrDB::Read returns an error code.
@EthanHeilman
Copy link
Contributor Author

@laanwj I removed hash from CAddrDB::Read

@sipa
Copy link
Member

sipa commented May 5, 2016

utACK 1475ecf

@laanwj
Copy link
Member

laanwj commented May 5, 2016

@EthanHeilman Looks good to me now, thanks. utACK

} catch (const std::exception& e) {
exceptionThrown = true;
}
// Even through de-serialization failed adddrman is not left in a clean state.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adddrman

@sipa sipa merged commit 1475ecf into bitcoin:master May 17, 2016
sipa added a commit that referenced this pull request May 17, 2016
1475ecf Fix de-serialization bug where AddrMan is corrupted after exception * CAddrDB modified so that when de-serialization code throws an exception Addrman is reset to a clean state * CAddrDB modified to make unit tests possible * Regression test created to ensure bug is fixed * StartNode modifed to clear adrman if CAddrDB::Read returns an error code. (EthanHeilman)
This was referenced May 18, 2016
sickpig referenced this pull request in sickpig/BitcoinUnlimited Apr 13, 2017
1475ecf Fix de-serialization bug where AddrMan is corrupted after exception (EthanHeilman)

* CAddrDB modified so that when de-serialization code throws an exception Addrman is reset to a clean state
* CAddrDB modified to make unit tests possible
* Regression test created to ensure bug is fixed
* StartNode modifed to clear adrman if CAddrDB::Read returns an error code.
sickpig referenced this pull request in sickpig/BitcoinUnlimited Apr 14, 2017
1475ecf Fix de-serialization bug where AddrMan is corrupted after exception (EthanHeilman)

* CAddrDB modified so that when de-serialization code throws an exception Addrman is reset to a clean state
* CAddrDB modified to make unit tests possible
* Regression test created to ensure bug is fixed
* StartNode modifed to clear adrman if CAddrDB::Read returns an error code.
zkbot added a commit to zcash/zcash that referenced this pull request Nov 12, 2020
banlist.dat: store banlist on disk

Cherry-picked from the following upstream PRs:

- bitcoin/bitcoin#6310
- bitcoin/bitcoin#6315
  - Only the new signal and net changes, no QT.
- bitcoin/bitcoin#7350
- bitcoin/bitcoin#7458
- bitcoin/bitcoin#7696
- bitcoin/bitcoin#7959
- bitcoin/bitcoin#7906
  - Only the ban-related commits.
- bitcoin/bitcoin#8392
  - Only the second commit.
- bitcoin/bitcoin#8085
  - Only the second commit.
- bitcoin/bitcoin#10564

Part of #2074.
zkbot added a commit to zcash/zcash that referenced this pull request Feb 11, 2021
banlist.dat: store banlist on disk

Cherry-picked from the following upstream PRs:

- bitcoin/bitcoin#6310
- bitcoin/bitcoin#6315
  - Only the new signal and net changes, no QT.
- bitcoin/bitcoin#7350
- bitcoin/bitcoin#7458
- bitcoin/bitcoin#7696
- bitcoin/bitcoin#7959
- bitcoin/bitcoin#7906
  - Only the ban-related commits.
- bitcoin/bitcoin#8392
  - Only the second commit.
- bitcoin/bitcoin#8085
  - Only the second commit.
- bitcoin/bitcoin#10564

Part of #2074.
zkbot added a commit to zcash/zcash that referenced this pull request Feb 17, 2021
banlist.dat: store banlist on disk

Cherry-picked from the following upstream PRs:

- bitcoin/bitcoin#6310
- bitcoin/bitcoin#6315
  - Only the new signal and net changes, no QT.
- bitcoin/bitcoin#7350
- bitcoin/bitcoin#7458
- bitcoin/bitcoin#7696
- bitcoin/bitcoin#7959
- bitcoin/bitcoin#7906
  - Only the ban-related commits.
- bitcoin/bitcoin#8392
  - Only the second commit.
- bitcoin/bitcoin#8085
  - Only the second commit.
- bitcoin/bitcoin#10564
- bitcoin/bitcoin#13061
  - Wasn't needed when I first made this backport in 2017, but it is now!

Part of #2074.
zkbot added a commit to zcash/zcash that referenced this pull request Feb 17, 2021
banlist.dat: store banlist on disk

Cherry-picked from the following upstream PRs:

- bitcoin/bitcoin#6310
- bitcoin/bitcoin#6315
  - Only the new signal and net changes, no QT.
- bitcoin/bitcoin#7350
- bitcoin/bitcoin#7458
- bitcoin/bitcoin#7696
- bitcoin/bitcoin#7959
- bitcoin/bitcoin#7906
  - Only the ban-related commits.
- bitcoin/bitcoin#8392
  - Only the second commit.
- bitcoin/bitcoin#8085
  - Only the second commit.
- bitcoin/bitcoin#10564
- bitcoin/bitcoin#13061
  - Wasn't needed when I first made this backport in 2017, but it is now!

Part of #2074.
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants