Skip to content

Conversation

sipa
Copy link
Member

@sipa sipa commented Nov 26, 2020

This changes anchors.dat to use ADDRV2_FORMAT, so its CAddress entries can use v2 serialization.

This solves #20511, but is incompatible with #20509.

This should be compatible with (v1) addresses stored by 0.21.0rc{1,2} as well, as it uses the client version stored in the CAddress records to decide which deserializer to use.

@naumenkogs
Copy link
Member

naumenkogs commented Nov 27, 2020

utACK.

I assume this code won't fail a node if you feed it an older anchors.dat format? At least I think it should not.

Pushing it to 0.21 solves it at scale, but we probably want to handle even corner cases.

@sipa
Copy link
Member Author

sipa commented Nov 27, 2020

@naumenkogs Yes, it should be as deserialization uses the version number written in the CAddress record. So anchors.dat from rc1 or rc2 should be read correctly.

Still, it seems there aren't any tests for the anchoring functionality, so it'd be good if someone wrote one, or it gets tested manually.

@@ -161,13 +161,13 @@ bool CAddrDB::Read(CAddrMan& addr, CDataStream& ssPeers)
void DumpAnchors(const fs::path& anchors_db_path, const std::vector<CAddress>& anchors)
{
LOG_TIME_SECONDS(strprintf("Flush %d outbound block-relay-only peer addresses to anchors.dat", anchors.size()));
SerializeFileDB("anchors", anchors_db_path, anchors);
SerializeFileDB("anchors", anchors_db_path, anchors, ADDRV2_FORMAT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this comment should be updated, given that now ADDRV2_FORMAT is also used here instead of CLIENT_VERSION:

bitcoin/src/netaddress.h

Lines 26 to 32 in 5009159

/**
* A flag that is ORed into the protocol version to designate that addresses
* should be serialized in (unserialized from) v2 format (BIP155).
* Make sure that this does not collide with any of the values in `version.h`
* or with `SERIALIZE_TRANSACTION_NO_WITNESS`.
*/
static constexpr int ADDRV2_FORMAT = 0x20000000;

}

std::vector<CAddress> ReadAnchors(const fs::path& anchors_db_path)
{
std::vector<CAddress> anchors;
if (DeserializeFileDB(anchors_db_path, anchors)) {
if (DeserializeFileDB(anchors_db_path, anchors, ADDRV2_FORMAT)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will not deserialize properly anchors.dat from before this PR.

  • CLIENT_VERSION is something like 219900
  • ADDRV2_FORMAT is 0x20000000

In anchors.dat before this PR we have CAddress entries which contain version=219900 and addresses serialized in addrv1 format.

With this PR we will set the stream version to ADDRV2_FORMAT during deserialize:

    // Notice added comments below
    SERIALIZE_METHODS(CAddress, obj)
    {
        SER_READ(obj, obj.nTime = TIME_INIT);
        int nVersion = s.GetVersion(); // nVersion will be assigned ADDRV2_FORMAT (0x20000000)
        if (s.GetType() & SER_DISK) {
            READWRITE(nVersion); // nVersion will be assigned CLIENT_VERSION (219900)
        }
...
        if (nVersion & ADDRV2_FORMAT) { // this will be false
            uint64_t services_tmp;
            SER_WRITE(obj, services_tmp = obj.nServices);
            READWRITE(Using<CompactSizeFormatter<false>>(services_tmp));
            SER_READ(obj, obj.nServices = static_cast<ServiceFlags>(services_tmp));
        } else {
            // it will read services as 8 bytes (correct, as that is what is actually stored on disk)
            READWRITE(Using<CustomUintFormatter<8>>(obj.nServices));
        }
        // oops, this will deserialize CService, which will deserialize CNetAddr which will check
        // s.GetVersion() which is ADDRV2_FORMAT (0x20000000) and will deserialize the
        // address as if it is stored in addrv2 format on disk, but it is actually stored in addrv1,
        // so this will deserialize garbage
        READWRITEAS(CService, obj);
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case the file checksum will mismatch and we won't load any anchors at all, which is ok (anchors.dat is not in any release and losing the file isn't a problem).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that it will not be able to read "old" anchors.dat (I am not sure of the exact way it will fail, maybe it will be indeed checksum failure).

I mention this because the comments above give the impression that "old" anchors.dat would be readable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be compatible with (v1) addresses stored by 0.21.0rc{1,2} as well...

Such compatibility shouldn't be a point for the reason mentioned by @jnewbery.

Copy link
Contributor

@jnewbery jnewbery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK, although I'm not sure how urgent this is. If I'm reading the code correctly, then the effect will be:

  • on shutdown, we'll save all anchor connections in addrv1 serialization. For ipv4/6/torv2/etc this is fine. For addresses which are not addrv1 serializable (eg torv3), we'll store all zeros (

    bitcoin/src/netaddress.h

    Lines 333 to 334 in 8ed2c72

    // Serialize TORv3, I2P and CJDNS as all-zeros.
    memset(arr, 0x0, V1_SERIALIZATION_SIZE);
    ).
  • on startup, when we read those addresses, we'll fail to connect to them and make other block-relay-only connections.

(it's very possible I've misread the code - this is split across addrdb/netaddress/net and is quite hard to follow)

If I'm right, then I don't think we necessarily need to fix this for 0.21. We didn't have anchor connections or torv3 before, so the fact that they don't work together isn't a regression and can always be fixed up later.

I think the same problem exists for banlist.dat - we'll always serialize/deserialize in addrv1 format.

@@ -35,7 +35,7 @@ bool SerializeDB(Stream& stream, const Data& data)
}

template <typename Data>
bool SerializeFileDB(const std::string& prefix, const fs::path& path, const Data& data)
bool SerializeFileDB(const std::string& prefix, const fs::path& path, const Data& data, int version = CLIENT_VERSION)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer this (and DeserializeFileDB) to not have a default argument. There are only three places where they're called, and it seems better to be explicit in those places.

@jonatack
Copy link
Member

Concept ACK, could use test coverage.

@vasild
Copy link
Contributor

vasild commented Nov 27, 2020

it's very possible I've misread the code - this is split across addrdb/netaddress/net

I think you read it correctly (or at least I read it in the same way). It will deserialize 0s from disk which will be invalid addresses which later will be skipped gracefully:

bitcoin/src/net.cpp

Lines 1971 to 1975 in 5009159

const CAddress addr = m_anchors.back();
m_anchors.pop_back();
if (!addr.IsValid() || IsLocal(addr) || !IsReachable(addr) ||
!HasAllDesirableServiceFlags(addr.nServices) ||
setConnected.count(addr.GetGroup(addrman.m_asmap))) continue;

@jnewbery
Copy link
Contributor

it uses the client version stored in the CAddress records to decide which deserializer to use.

anchors.dat from rc1 or rc2 should be read correctly.

@sipa I don't understand this. With this PR, we always unserialize anchors.dat using a CAutoFile with version ADDRV2_FORMAT. I think the version stored in the CAddress will be used to deserialize the services in the CAddress, but the underlying CNetAddr will always be unserialized as addrv2.

@vasild
Copy link
Contributor

vasild commented Nov 27, 2020

I think the inconsistency in CAddress deserialization should be fixed in one way (#20509) or another:

fix CAddress deser inconsistency
diff --git i/src/protocol.h w/src/protocol.h
index 309fac621..634a1a91a 100644
--- i/src/protocol.h
+++ w/src/protocol.h
@@ -367,17 +367,17 @@ public:
     CAddress(CService ipIn, ServiceFlags nServicesIn, uint32_t nTimeIn) : CService{ipIn}, nTime{nTimeIn}, nServices{nServicesIn} {};
 
     SERIALIZE_METHODS(CAddress, obj)
     {
         SER_READ(obj, obj.nTime = TIME_INIT);
         int nVersion = s.GetVersion();
-        if (s.GetType() & SER_DISK) {
+        const bool is_disk_ser = s.GetType() & SER_DISK;
+        if (is_disk_ser) {
             READWRITE(nVersion);
         }
-        if ((s.GetType() & SER_DISK) ||
-            (nVersion != INIT_PROTO_VERSION && !(s.GetType() & SER_GETHASH))) {
+        if (is_disk_ser || (nVersion != INIT_PROTO_VERSION && !(s.GetType() & SER_GETHASH))) {
             // The only time we serialize a CAddress object without nTime is in
             // the initial VERSION messages which contain two CAddress records.
             // At that point, the serialization version is INIT_PROTO_VERSION.
             // After the version handshake, serialization version is >=
             // MIN_PEER_PROTO_VERSION and all ADDR messages are serialized with
             // nTime.
@@ -388,13 +388,16 @@ public:
             SER_WRITE(obj, services_tmp = obj.nServices);
             READWRITE(Using<CompactSizeFormatter<false>>(services_tmp));
             SER_READ(obj, obj.nServices = static_cast<ServiceFlags>(services_tmp));
         } else {
             READWRITE(Using<CustomUintFormatter<8>>(obj.nServices));
         }
+        const int stream_version_orig = s.GetVersion();
+        SER_READ(obj, if (is_disk_ser) { s.SetVersion(nVersion); });
         READWRITEAS(CService, obj);
+        SER_READ(obj, if (is_disk_ser) { s.SetVersion(stream_version_orig); });
     }

The above patch, on top of this PR will:

@hebasto
Copy link
Member

hebasto commented Nov 27, 2020

  • Work just fine if there is ever addrv3 format

Future proof is good.

@vasild
Copy link
Contributor

vasild commented Nov 27, 2020

I think the same problem exists for banlist.dat - we'll always serialize/deserialize in addrv1 format.

True that we will always ser/deser in addrv1, but that is not a problem because the banlist contains CSubNet objects which must be either IPv4 or IPv6 (no Tor v3 involved):

bitcoin/src/netaddress.cpp

Lines 992 to 995 in e2ff5e7

CSubNet::CSubNet(const CNetAddr& addr, uint8_t mask) : CSubNet()
{
valid = (addr.IsIPv4() && mask <= ADDR_IPV4_SIZE * 8) ||
(addr.IsIPv6() && mask <= ADDR_IPV6_SIZE * 8);

@jnewbery
Copy link
Contributor

that is not a problem because the banlist contains CSubNet objects which must be either IPv4 or IPv6 (no Tor v3 involved)

Ah good. Thanks @vasild!

@sipa
Copy link
Member Author

sipa commented Nov 27, 2020

@vasild Did you see #20516?

@sipa
Copy link
Member Author

sipa commented Nov 27, 2020

@jnewbery Oops, yes, that is only true when combined with #20516, or with @vasild's patch above.

@sipa
Copy link
Member Author

sipa commented Nov 27, 2020

If I'm right, then I don't think we necessarily need to fix this for 0.21. We didn't have anchor connections or torv3 before, so the fact that they don't work together isn't a regression and can always be fixed up later

That's fair, and you've shown that this PR as-is actually makes things worse, as it'll fail to deserialize the CService object.

To fix it, we'll either need @vasild's patch from #20514 (comment), or doing it on top of #20516. I'm going to close this PR, and take the functionality there.

@sipa sipa closed this Nov 27, 2020
@maflcko maflcko removed this from the 0.21.0 milestone Dec 4, 2020
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants