-
Notifications
You must be signed in to change notification settings - Fork 37.7k
net: Use serialization parameters for CAddress serialization #25284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The head ref may contain hidden characters: "2206-split-consensus-net-\u{1F63B}"
Conversation
Concept ACK |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
81885b8
to
bdc2ffe
Compare
You mean the "consensus" module, right, not actual consensus? |
…rsion and use it where possible facc2fa Use AutoFile where possible (MacroFake) 6666803 streams: Add AutoFile without ser-type and ser-version (MacroFake) Pull request description: This was done in the context of bitcoin/bitcoin#25284 , but I think it also makes sense standalone. The basic idea is that serialization type should not be initialized when it is not needed. Same for the serialization version. So do this here for `AutoFile`. `CAutoFile` remains in places where it is not yet possible. ACKs for top commit: laanwj: Code review ACK facc2fa fanquake: ACK facc2fa Tree-SHA512: d82d024d55af57565ac53d9d1517afafc12b46964effba0332de62a6c77869356fa77f89e6d4834438fff44c45b64fccdf5a1358bfea03e28dfe55013b3c099d
bdc2ffe
to
520bb2e
Compare
…version and use it where possible faf9acc Use HashWriter where possible (MacroFake) faa5425 Add HashWriter without ser-type and ser-version (MacroFake) Pull request description: This was done in the context of bitcoin/bitcoin#25284 , but I think it also makes sense standalone. The basic idea is that serialization type should not be initialized when it is not needed. Same for the serialization version. So do this here for `HashWriter`. `CHashWriter` remains in places where it is not yet possible. ACKs for top commit: sipa: utACK faf9acc Empact: utACK bitcoin/bitcoin@faf9acc Tree-SHA512: 544cc712436e49f6e608120bcd3ddc5ea72dd236554ce30fb6cfff34a92d7e67b6e6527336ad0f5b6365e2b2884f4c6508aef775953ccd9312f17752729703f2
520bb2e
to
47d8681
Compare
47d8681
to
3e07e76
Compare
3e07e76
to
72f5065
Compare
src/addrdb.cpp
Outdated
@@ -175,13 +174,15 @@ bool CBanDB::Read(banmap_t& banSet) | |||
bool DumpPeerAddresses(const ArgsManager& args, const AddrMan& addr) | |||
{ | |||
const auto pathAddr = args.GetDataDirNet() / "peers.dat"; | |||
return SerializeFileDB("peers", pathAddr, addr, CLIENT_VERSION); | |||
return SerializeFileDB("peers", pathAddr, WithParams(CAddress::V1_DISK, addr)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels unnecessary. Addrman's serialization controls how the addresses in it are serialized (only V1_DISK
or V2_DISK
should be used), so it shouldn't be necessary to provide it with a CAddress
parameter (and doing so is misleading, as it'd be overridden anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Patch with these changes is at https://github.com/ajtowns/bitcoin/commits/202309-serializeaddrmansimp if you want to squash it in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is misleading, but the unit and fuzz tests would still set Network
, as opposed to Disk
sometimes. I agree with the cleanup, but I think this is a slight change (improvement) in what the unit and fuzz tests are testing.
Though, as it only affects tests, I think the refactoring
label can be kept on this pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Patch with these changes is at https://github.com/ajtowns/bitcoin/commits/202309-serializeaddrmansimp if you want to squash it in
Thanks for the patch! I've squashed it and adjusted the commit message to explain the test changes.
src/addrdb.cpp
Outdated
@@ -191,7 +192,7 @@ util::Result<std::unique_ptr<AddrMan>> LoadAddrman(const NetGroupManager& netgro | |||
const auto start{SteadyClock::now()}; | |||
const auto path_addr{args.GetDataDirNet() / "peers.dat"}; | |||
try { | |||
DeserializeFileDB(path_addr, *addrman, CLIENT_VERSION); | |||
DeserializeFileDB(path_addr, WithParams(CAddress::V1_DISK, *addrman)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, I think it should be possible to unserialize addrman without parameter.
src/addrman.cpp
Outdated
@@ -171,8 +171,8 @@ void AddrManImpl::Serialize(Stream& s_) const | |||
*/ | |||
|
|||
// Always serialize in the latest version (FILE_FORMAT). | |||
|
|||
OverrideStream<Stream> s(&s_, s_.GetType(), s_.GetVersion() | ADDRV2_FORMAT); | |||
const CAddress::SerParams ser_params{{CNetAddr::Encoding::V2}, s_.GetParams().fmt}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addrman is only ever serialized to disk, so the fmt here should always be disk (and thus, ser_params will only ever be V2_DISK
here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in the unit tests, see #25284 (comment) and:
test/addrman_tests.cpp(690): Entering test case "addrman_serialization"
test_bitcoin: addrman.cpp:175: void AddrManImpl::Serialize(Stream &) const [Stream = ParamsStream<CAddress::SerParams, DataStream>]: Assertion `s_.GetParams().fmt==CAddress::Format::Disk' failed.
make[3]: *** [Makefile:22464: test/addrman_tests.cpp.test] Error 1
} | ||
|
||
OverrideStream<Stream> s(&s_, s_.GetType(), stream_version); | ||
const CAddress::SerParams ser_params{{stream_enc}, s_.GetParams().fmt}; | ||
ParamsStream s{ser_params, s_}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line and the lines above could be simplified, as it's only choosing between options V1_DISK
and V2_DISK
.
@@ -1080,4 +1134,61 @@ size_t GetSerializeSizeMany(int nVersion, const T&... t) | |||
return sc.size(); | |||
} | |||
|
|||
/** Wrapper that overrides the GetParams() function of a stream (and hides GetVersion/GetType). */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this GetParams
defined that the method here overrides? Does this refer to the underlying type instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I wonder if it makes sense to override the GetParams()
function at all. Generally, no underlying stream type has a GetParams()
method, so "overriding" really means "defines".
And if the GetParams()
method is truly overridden, it should be rare, and it could make more sense to first require the stream to reveal its underlying unwrapped stream, without the GetParams()
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(My plan was to look into this after C++20, to avoid the risk of writing clunky C++17 code.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could do:
#ifdef HAVE_CXX20
template<typename Stream>
concept ParamsStreamConcept = requires(Stream& s) { s.GetParams(); };
template<typename Stream>
concept PlainStreamConcept = !ParamsStreamConcept<Stream>;
#else
template<typename Stream>
bool ParamsStreamConcept = true;
template<typename Stream>
bool PlainStreamConcept = true;
#endif
/** Wrapper that overrides the GetParams() function of a stream (and hides GetVersion/GetType). */
template <typename Params, typename SubStream>
class ParamsStream
{
static_assert(PlainStreamConcept<SubStream>, "avoid wrapping a ParamsStream in a ParamsStream");
SubStream& GetSubStream() { return m_substream; }
or similar if you wanted, which would catch this sort of case via the C++20 CI builds. At present the only lines that violate this rule are READWRITE(WithParams(ser_params, AsBase<CService>(obj)));
in protocol.h and Derived::SERIALIZE_METHOD_PARAMS
in serialize_tests.cpp; you need a GetSubStream
getter to fix those up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for reference, GetSubStream
was present in the original code, but only to avoid too-deeply nested templates, see https://github.com/bitcoin/bitcoin/pull/19503/files#r455147534
Since it wasn't enforced, I removed it. I'll add your patch once and if there is C++20.
This also cleans up the addrman (de)serialization code paths to only allow `Disk` serialization. Some unit tests previously forced a `Network` serialization, which does not make sense, because Bitcoin Core in production will always `Disk` serialize. This cleanup idea was suggested by Pieter Wuille and implemented by Anthony Towns. Co-authored-by: Pieter Wuille <pieter@wuille.net> Co-authored-by: Anthony Towns <aj@erisian.com.au>
Co-authored-by: Vasil Dimov <vd@FreeBSD.org>
faba341
to
fa626af
Compare
Force pushed the simplification: #25284 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK fa626af
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK fa626af
*/ | ||
struct CSerActionSerialize | ||
{ | ||
struct ActionSerialize { | ||
constexpr bool ForRead() const { return false; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These could be static constexpr bool ForRead() { return ...; }
Actually, ::SerRead
etc could be static member functions of these classes rather than being the global namespace too:
//#define READWRITE(...) (::SerReadWriteMany(s, ser_action, __VA_ARGS__))
#define READWRITE(...) (ser_action.SerReadWriteMany(s, __VA_ARGS__))
struct ActionSerialize
{
static constexpr bool ForRead() { return false; }
template<typename Stream, typename... Args>
static void SerReadWriteMany(Stream& s, const Args&... args)
{
::SerializeMany(s, args...);
}
Maybe something to consider later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jup, will take look later, as this comment is on a line untouched in this pull and thus seems unrelated.
template <typename Params, typename T> | ||
class ParamsWrapper | ||
{ | ||
static_assert(std::is_lvalue_reference<T>::value, "ParamsWrapper needs an lvalue reference type T"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not write T& m_object
and ParamsWrapper(.., T& obj)
, ParamsWrapper<.., FooObj>
(instead of FooObj&
) and drop this assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jup, will take a look in one of the follow-ups.
Followups / tidy / lint to keep track of: |
Since this code is mostly of the form struct SerParams {
...
template <typename T>
auto operator()(T& obj) const { return WithParams(*this, obj); }
};
s << V1_DISK(data) That way when reading the code you're really only seeing the info you care about. |
…iskBlockIndex e73d2a8 refactor: remove clientversion include from dbwrapper.h (Cory Fields) 4240a08 refactor: Use DataStream now that version/type are unused (Cory Fields) f15f790 Remove version/hashing options from CBlockLocator/CDiskBlockIndex (Cory Fields) Pull request description: This is also a much simpler replacement for #28327. There are version fields in `CBlockLocator` and `CDiskBlockIndex` that have always been written but discarded when read. I intended to convert them to use SerParams as introduced by #25284, which [ended up looking like this](theuni@3e3af45). However because we don't currently have any definition of what a hash value would mean for either one of those, and we've never assigned the version field any meaning, I think it's better to just not worry about them. If we ever need to assign meaning in the future, we can introduce `SerParams` as was done for `CAddress`. As for the dummy values chosen: `CDiskBlockIndex::DUMMY_VERSION` was easy as the highest ever client version, and I don't expect any objection there. `CBlockLocator::DUMMY_VERSION` is hard-coded to the higest _PROTOCOL_ version ever used. This is to avoid a sudden bump that would be visible on the network if CLIENT_VERSION were used instead. In the future, if we ever need to use the value, we can discard anything in the CLIENT_VERSION range (for a few years as needed), as it's quite a bit higher. While reviewing, I suggest looking at the throwaway `SerParams` commit above as it shows where the call-sites are. I believe that should be enough to convince one's self that hashing is never used. ACKs for top commit: TheCharlatan: Re-ACK e73d2a8 ajtowns: reACK e73d2a8 Tree-SHA512: 45b0dd7c2e918493e2ee92a8e35320ad17991cb8908cb811150a96c5fd584ce177c775baeeb8675a602c90b9ba9203b8cefc0a2a0c6a71078b1d9c2b41e1f3ba
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
post-merge self-re-review, left some fuzz nits
edit: fixed in #28470
} | ||
m_connman.PushMessage(&node, CNetMsgMaker(node.GetCommonVersion()).Make(make_flags, msg_type, peer.m_addrs_to_send)); | ||
m_connman.PushMessage(&node, CNetMsgMaker(node.GetCommonVersion()).Make(msg_type, WithParams(CAddress::SerParams{{ser_enc}, CAddress::Format::Network}, peer.m_addrs_to_send))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the Make
can be written shorter as:
+const bool peer_v2{peer.m_wants_addrv2};
-...Make(msg_type, WithParams(CAddress::SerParams{{ser_enc}, CAddress::Format::Network}, peer.m_addrs_to_send))
+...Make(peer_v2 ? NetMsgType::ADDRV2 : NetMsgType::ADDR, peer_v2 ? CAddress::V2_NETWORK : CAddress::V1_NETWORK, peer.m_addrs_to_send))
and dropping the above if
.
if (na.IsAddrV1Compatible()) { | ||
AssertEqualAfterSerializeDeserialize(na); | ||
AssertEqualAfterSerializeDeserialize(na, ConsumeDeserializationParams<CNetAddr::SerParams>(fdp)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This should be V1, because V2 is checked unconditionally below.
if (s.IsAddrV1Compatible()) { | ||
AssertEqualAfterSerializeDeserialize(s); | ||
AssertEqualAfterSerializeDeserialize(s, ConsumeDeserializationParams<CNetAddr::SerParams>(fdp)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same?
FUZZ_TARGET(address_deserialize, .init = initialize_deserialize) | ||
{ | ||
FuzzedDataProvider fdp{buffer.data(), buffer.size()}; | ||
const auto ser_enc{ConsumeDeserializationParams<CNetAddr::SerParams>(fdp)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could pick from CAddress::SerParams
to cover a wider range?
fb6a2ab scripted-diff: use SER_PARAMS_OPFUNC (Anthony Towns) 5e5c8f8 serialize: add SER_PARAMS_OPFUNC (Anthony Towns) 33203f5 serialize: specify type for ParamsWrapper not ref (Anthony Towns) bf147bf serialize: move ser_action functions out of global namespace (Anthony Towns) Pull request description: Cleanups after #25284: * ser_action namespacing - #25284 (comment) * make reference implicit - #25284 (comment) * function notation - #25284 (comment) ACKs for top commit: MarcoFalke: lgtm ACK fb6a2ab 💨 TheCharlatan: ACK fb6a2ab Tree-SHA512: aacca2ee9cfec360ade6b394606e13d1dfe05bc29c5fbdd48a4e6992bd420312d4ed0d32218d95c560646af326e9977728dc2e759990636298e326947f6f9526
… and CDiskBlockIndex e73d2a8 refactor: remove clientversion include from dbwrapper.h (Cory Fields) 4240a08 refactor: Use DataStream now that version/type are unused (Cory Fields) f15f790 Remove version/hashing options from CBlockLocator/CDiskBlockIndex (Cory Fields) Pull request description: This is also a much simpler replacement for bitcoin#28327. There are version fields in `CBlockLocator` and `CDiskBlockIndex` that have always been written but discarded when read. I intended to convert them to use SerParams as introduced by bitcoin#25284, which [ended up looking like this](theuni@3e3af45). However because we don't currently have any definition of what a hash value would mean for either one of those, and we've never assigned the version field any meaning, I think it's better to just not worry about them. If we ever need to assign meaning in the future, we can introduce `SerParams` as was done for `CAddress`. As for the dummy values chosen: `CDiskBlockIndex::DUMMY_VERSION` was easy as the highest ever client version, and I don't expect any objection there. `CBlockLocator::DUMMY_VERSION` is hard-coded to the higest _PROTOCOL_ version ever used. This is to avoid a sudden bump that would be visible on the network if CLIENT_VERSION were used instead. In the future, if we ever need to use the value, we can discard anything in the CLIENT_VERSION range (for a few years as needed), as it's quite a bit higher. While reviewing, I suggest looking at the throwaway `SerParams` commit above as it shows where the call-sites are. I believe that should be enough to convince one's self that hashing is never used. ACKs for top commit: TheCharlatan: Re-ACK e73d2a8 ajtowns: reACK e73d2a8 Tree-SHA512: 45b0dd7c2e918493e2ee92a8e35320ad17991cb8908cb811150a96c5fd584ce177c775baeeb8675a602c90b9ba9203b8cefc0a2a0c6a71078b1d9c2b41e1f3ba
a0c254c Drop CHashWriter (Anthony Towns) c94f7e5 Drop OverrideStream (Anthony Towns) 6e9e4e6 Use ParamsWrapper for witness serialization (Anthony Towns) Pull request description: Choose whether witness is included in transaction serialization via serialization parameter rather than the stream version. See #25284 and #19477 for previous context. ACKs for top commit: maflcko: re-ACK a0c254c 🐜 theuni: ACK a0c254c Tree-SHA512: 8fd5cadfd84c5128e36c34a51fb94fdccd956280e7f65b7d73c512d6a9cdb53cdd3649de99ffab5322bd34be26cb95ab4eb05932b3b9de9c11d85743f50dcb13
It seems confusing that picking a wrong value for
ADDRV2_FORMAT
could have effects on consensus. (See the docstring ofADDRV2_FORMAT
).Fix this by implementing #19477 (comment) .
This may also help with libbitcoinkernel, see #28327