-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Use addrv2 serialization in anchors.dat #20514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
utACK. I assume this code won't fail a node if you feed it an older anchors.dat format? At least I think it should not. Pushing it to 0.21 solves it at scale, but we probably want to handle even corner cases. |
@naumenkogs Yes, it should be as deserialization uses the version number written in the CAddress record. So anchors.dat from rc1 or rc2 should be read correctly. Still, it seems there aren't any tests for the anchoring functionality, so it'd be good if someone wrote one, or it gets tested manually. |
@@ -161,13 +161,13 @@ bool CAddrDB::Read(CAddrMan& addr, CDataStream& ssPeers) | |||
void DumpAnchors(const fs::path& anchors_db_path, const std::vector<CAddress>& anchors) | |||
{ | |||
LOG_TIME_SECONDS(strprintf("Flush %d outbound block-relay-only peer addresses to anchors.dat", anchors.size())); | |||
SerializeFileDB("anchors", anchors_db_path, anchors); | |||
SerializeFileDB("anchors", anchors_db_path, anchors, ADDRV2_FORMAT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this comment should be updated, given that now ADDRV2_FORMAT
is also used here instead of CLIENT_VERSION
:
Lines 26 to 32 in 5009159
/** | |
* A flag that is ORed into the protocol version to designate that addresses | |
* should be serialized in (unserialized from) v2 format (BIP155). | |
* Make sure that this does not collide with any of the values in `version.h` | |
* or with `SERIALIZE_TRANSACTION_NO_WITNESS`. | |
*/ | |
static constexpr int ADDRV2_FORMAT = 0x20000000; |
} | ||
|
||
std::vector<CAddress> ReadAnchors(const fs::path& anchors_db_path) | ||
{ | ||
std::vector<CAddress> anchors; | ||
if (DeserializeFileDB(anchors_db_path, anchors)) { | ||
if (DeserializeFileDB(anchors_db_path, anchors, ADDRV2_FORMAT)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will not deserialize properly anchors.dat
from before this PR.
CLIENT_VERSION
is something like219900
ADDRV2_FORMAT
is0x20000000
In anchors.dat
before this PR we have CAddress
entries which contain version=219900
and addresses serialized in addrv1 format.
With this PR we will set the stream version to ADDRV2_FORMAT
during deserialize:
// Notice added comments below
SERIALIZE_METHODS(CAddress, obj)
{
SER_READ(obj, obj.nTime = TIME_INIT);
int nVersion = s.GetVersion(); // nVersion will be assigned ADDRV2_FORMAT (0x20000000)
if (s.GetType() & SER_DISK) {
READWRITE(nVersion); // nVersion will be assigned CLIENT_VERSION (219900)
}
...
if (nVersion & ADDRV2_FORMAT) { // this will be false
uint64_t services_tmp;
SER_WRITE(obj, services_tmp = obj.nServices);
READWRITE(Using<CompactSizeFormatter<false>>(services_tmp));
SER_READ(obj, obj.nServices = static_cast<ServiceFlags>(services_tmp));
} else {
// it will read services as 8 bytes (correct, as that is what is actually stored on disk)
READWRITE(Using<CustomUintFormatter<8>>(obj.nServices));
}
// oops, this will deserialize CService, which will deserialize CNetAddr which will check
// s.GetVersion() which is ADDRV2_FORMAT (0x20000000) and will deserialize the
// address as if it is stored in addrv2 format on disk, but it is actually stored in addrv1,
// so this will deserialize garbage
READWRITEAS(CService, obj);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this case the file checksum will mismatch and we won't load any anchors at all, which is ok (anchors.dat is not in any release and losing the file isn't a problem).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that it will not be able to read "old" anchors.dat
(I am not sure of the exact way it will fail, maybe it will be indeed checksum failure).
I mention this because the comments above give the impression that "old" anchors.dat
would be readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be compatible with (v1) addresses stored by 0.21.0rc{1,2} as well...
Such compatibility shouldn't be a point for the reason mentioned by @jnewbery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK, although I'm not sure how urgent this is. If I'm reading the code correctly, then the effect will be:
- on shutdown, we'll save all anchor connections in addrv1 serialization. For ipv4/6/torv2/etc this is fine. For addresses which are not addrv1 serializable (eg torv3), we'll store all zeros (
Lines 333 to 334 in 8ed2c72
// Serialize TORv3, I2P and CJDNS as all-zeros. memset(arr, 0x0, V1_SERIALIZATION_SIZE); - on startup, when we read those addresses, we'll fail to connect to them and make other block-relay-only connections.
(it's very possible I've misread the code - this is split across addrdb/netaddress/net and is quite hard to follow)
If I'm right, then I don't think we necessarily need to fix this for 0.21. We didn't have anchor connections or torv3 before, so the fact that they don't work together isn't a regression and can always be fixed up later.
I think the same problem exists for banlist.dat - we'll always serialize/deserialize in addrv1 format.
@@ -35,7 +35,7 @@ bool SerializeDB(Stream& stream, const Data& data) | |||
} | |||
|
|||
template <typename Data> | |||
bool SerializeFileDB(const std::string& prefix, const fs::path& path, const Data& data) | |||
bool SerializeFileDB(const std::string& prefix, const fs::path& path, const Data& data, int version = CLIENT_VERSION) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer this (and DeserializeFileDB) to not have a default argument. There are only three places where they're called, and it seems better to be explicit in those places.
Concept ACK, could use test coverage. |
I think you read it correctly (or at least I read it in the same way). It will deserialize 0s from disk which will be invalid addresses which later will be skipped gracefully: Lines 1971 to 1975 in 5009159
|
@sipa I don't understand this. With this PR, we always unserialize anchors.dat using a CAutoFile with version ADDRV2_FORMAT. I think the version stored in the CAddress will be used to deserialize the services in the CAddress, but the underlying CNetAddr will always be unserialized as addrv2. |
I think the inconsistency in fix CAddress deser inconsistencydiff --git i/src/protocol.h w/src/protocol.h
index 309fac621..634a1a91a 100644
--- i/src/protocol.h
+++ w/src/protocol.h
@@ -367,17 +367,17 @@ public:
CAddress(CService ipIn, ServiceFlags nServicesIn, uint32_t nTimeIn) : CService{ipIn}, nTime{nTimeIn}, nServices{nServicesIn} {};
SERIALIZE_METHODS(CAddress, obj)
{
SER_READ(obj, obj.nTime = TIME_INIT);
int nVersion = s.GetVersion();
- if (s.GetType() & SER_DISK) {
+ const bool is_disk_ser = s.GetType() & SER_DISK;
+ if (is_disk_ser) {
READWRITE(nVersion);
}
- if ((s.GetType() & SER_DISK) ||
- (nVersion != INIT_PROTO_VERSION && !(s.GetType() & SER_GETHASH))) {
+ if (is_disk_ser || (nVersion != INIT_PROTO_VERSION && !(s.GetType() & SER_GETHASH))) {
// The only time we serialize a CAddress object without nTime is in
// the initial VERSION messages which contain two CAddress records.
// At that point, the serialization version is INIT_PROTO_VERSION.
// After the version handshake, serialization version is >=
// MIN_PEER_PROTO_VERSION and all ADDR messages are serialized with
// nTime.
@@ -388,13 +388,16 @@ public:
SER_WRITE(obj, services_tmp = obj.nServices);
READWRITE(Using<CompactSizeFormatter<false>>(services_tmp));
SER_READ(obj, obj.nServices = static_cast<ServiceFlags>(services_tmp));
} else {
READWRITE(Using<CustomUintFormatter<8>>(obj.nServices));
}
+ const int stream_version_orig = s.GetVersion();
+ SER_READ(obj, if (is_disk_ser) { s.SetVersion(nVersion); });
READWRITEAS(CService, obj);
+ SER_READ(obj, if (is_disk_ser) { s.SetVersion(stream_version_orig); });
} The above patch, on top of this PR will:
|
Future proof is good. |
True that we will always ser/deser in addrv1, but that is not a problem because the banlist contains Lines 992 to 995 in e2ff5e7
|
Ah good. Thanks @vasild! |
That's fair, and you've shown that this PR as-is actually makes things worse, as it'll fail to deserialize the To fix it, we'll either need @vasild's patch from #20514 (comment), or doing it on top of #20516. I'm going to close this PR, and take the functionality there. |
This changes anchors.dat to use ADDRV2_FORMAT, so its CAddress entries can use v2 serialization.
This solves #20511, but is incompatible with #20509.
This should be compatible with (v1) addresses stored by 0.21.0rc{1,2} as well, as it uses the client version stored in the CAddress records to decide which deserializer to use.