Skip to content

Conversation

maflcko
Copy link
Member

@maflcko maflcko commented Jun 6, 2022

It seems confusing that picking a wrong value for ADDRV2_FORMAT could have effects on consensus. (See the docstring of ADDRV2_FORMAT).

Fix this by implementing #19477 (comment) .

This may also help with libbitcoinkernel, see #28327

@vasild
Copy link
Contributor

vasild commented Jun 6, 2022

Concept ACK

@DrahtBot
Copy link
Contributor

DrahtBot commented Jun 7, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK TheCharlatan, ajtowns
Concept ACK vasild
Stale ACK jonatack, theuni

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #26859 (fuzz: extend ConsumeNetAddr() to return I2P and CJDNS addresses by vasild)
  • #26621 (refactor: Continue moving application data from CNode to Peer by dergoegge)
  • #24034 (p2p: delete anchors.dat after trying to connect to that peers by brunoerg)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@luke-jr
Copy link
Member

luke-jr commented Jun 18, 2022

It seems confusing to have consensus (weakly) depend on low level BIP 155 addrv2 serialization.

You mean the "consensus" module, right, not actual consensus?

fanquake added a commit to bitcoin-core/gui that referenced this pull request Jul 20, 2022
…rsion and use it where possible

facc2fa Use AutoFile where possible (MacroFake)
6666803 streams: Add AutoFile without ser-type and ser-version (MacroFake)

Pull request description:

  This was done in the context of bitcoin/bitcoin#25284 , but I think it also makes sense standalone.

  The basic idea is that serialization type should not be initialized when it is not needed. Same for the serialization version.

  So do this here for `AutoFile`. `CAutoFile` remains in places where it is not yet possible.

ACKs for top commit:
  laanwj:
    Code review ACK facc2fa
  fanquake:
    ACK facc2fa

Tree-SHA512: d82d024d55af57565ac53d9d1517afafc12b46964effba0332de62a6c77869356fa77f89e6d4834438fff44c45b64fccdf5a1358bfea03e28dfe55013b3c099d
@maflcko maflcko force-pushed the 2206-split-consensus-net- branch from bdc2ffe to 520bb2e Compare July 22, 2022 07:12
fanquake added a commit to bitcoin-core/gui that referenced this pull request Jul 22, 2022
…version and use it where possible

faf9acc Use HashWriter where possible (MacroFake)
faa5425 Add HashWriter without ser-type and ser-version (MacroFake)

Pull request description:

  This was done in the context of bitcoin/bitcoin#25284 , but I think it also makes sense standalone.

  The basic idea is that serialization type should not be initialized when it is not needed. Same for the serialization version.

  So do this here for `HashWriter`. `CHashWriter` remains in places where it is not yet possible.

ACKs for top commit:
  sipa:
    utACK faf9acc
  Empact:
    utACK bitcoin/bitcoin@faf9acc

Tree-SHA512: 544cc712436e49f6e608120bcd3ddc5ea72dd236554ce30fb6cfff34a92d7e67b6e6527336ad0f5b6365e2b2884f4c6508aef775953ccd9312f17752729703f2
@maflcko maflcko force-pushed the 2206-split-consensus-net- branch from 520bb2e to 47d8681 Compare July 22, 2022 08:34
@maflcko maflcko force-pushed the 2206-split-consensus-net- branch from 47d8681 to 3e07e76 Compare August 1, 2022 12:16
@maflcko maflcko force-pushed the 2206-split-consensus-net- branch from 3e07e76 to 72f5065 Compare December 15, 2022 12:50
src/addrdb.cpp Outdated
@@ -175,13 +174,15 @@ bool CBanDB::Read(banmap_t& banSet)
bool DumpPeerAddresses(const ArgsManager& args, const AddrMan& addr)
{
const auto pathAddr = args.GetDataDirNet() / "peers.dat";
return SerializeFileDB("peers", pathAddr, addr, CLIENT_VERSION);
return SerializeFileDB("peers", pathAddr, WithParams(CAddress::V1_DISK, addr));
Copy link
Member

@sipa sipa Sep 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels unnecessary. Addrman's serialization controls how the addresses in it are serialized (only V1_DISK or V2_DISK should be used), so it shouldn't be necessary to provide it with a CAddress parameter (and doing so is misleading, as it'd be overridden anyway).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Patch with these changes is at https://github.com/ajtowns/bitcoin/commits/202309-serializeaddrmansimp if you want to squash it in

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is misleading, but the unit and fuzz tests would still set Network, as opposed to Disk sometimes. I agree with the cleanup, but I think this is a slight change (improvement) in what the unit and fuzz tests are testing.

Though, as it only affects tests, I think the refactoring label can be kept on this pull request.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Patch with these changes is at https://github.com/ajtowns/bitcoin/commits/202309-serializeaddrmansimp if you want to squash it in

Thanks for the patch! I've squashed it and adjusted the commit message to explain the test changes.

src/addrdb.cpp Outdated
@@ -191,7 +192,7 @@ util::Result<std::unique_ptr<AddrMan>> LoadAddrman(const NetGroupManager& netgro
const auto start{SteadyClock::now()};
const auto path_addr{args.GetDataDirNet() / "peers.dat"};
try {
DeserializeFileDB(path_addr, *addrman, CLIENT_VERSION);
DeserializeFileDB(path_addr, WithParams(CAddress::V1_DISK, *addrman));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I think it should be possible to unserialize addrman without parameter.

src/addrman.cpp Outdated
@@ -171,8 +171,8 @@ void AddrManImpl::Serialize(Stream& s_) const
*/

// Always serialize in the latest version (FILE_FORMAT).

OverrideStream<Stream> s(&s_, s_.GetType(), s_.GetVersion() | ADDRV2_FORMAT);
const CAddress::SerParams ser_params{{CNetAddr::Encoding::V2}, s_.GetParams().fmt};
Copy link
Member

@sipa sipa Sep 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addrman is only ever serialized to disk, so the fmt here should always be disk (and thus, ser_params will only ever be V2_DISK here).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in the unit tests, see #25284 (comment) and:

test/addrman_tests.cpp(690): Entering test case "addrman_serialization"
test_bitcoin: addrman.cpp:175: void AddrManImpl::Serialize(Stream &) const [Stream = ParamsStream<CAddress::SerParams, DataStream>]: Assertion `s_.GetParams().fmt==CAddress::Format::Disk' failed.
make[3]: *** [Makefile:22464: test/addrman_tests.cpp.test] Error 1

}

OverrideStream<Stream> s(&s_, s_.GetType(), stream_version);
const CAddress::SerParams ser_params{{stream_enc}, s_.GetParams().fmt};
ParamsStream s{ser_params, s_};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line and the lines above could be simplified, as it's only choosing between options V1_DISK and V2_DISK.

@sipa sipa mentioned this pull request Sep 1, 2023
@@ -1080,4 +1134,61 @@ size_t GetSerializeSizeMany(int nVersion, const T&... t)
return sc.size();
}

/** Wrapper that overrides the GetParams() function of a stream (and hides GetVersion/GetType). */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this GetParams defined that the method here overrides? Does this refer to the underlying type instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I wonder if it makes sense to override the GetParams() function at all. Generally, no underlying stream type has a GetParams() method, so "overriding" really means "defines".

And if the GetParams() method is truly overridden, it should be rare, and it could make more sense to first require the stream to reveal its underlying unwrapped stream, without the GetParams() method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(My plan was to look into this after C++20, to avoid the risk of writing clunky C++17 code.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could do:

#ifdef HAVE_CXX20
template<typename Stream>
concept ParamsStreamConcept = requires(Stream& s) { s.GetParams(); };
template<typename Stream>
concept PlainStreamConcept = !ParamsStreamConcept<Stream>;
#else
template<typename Stream>
bool ParamsStreamConcept = true;
template<typename Stream>
bool PlainStreamConcept = true;
#endif

 /** Wrapper that overrides the GetParams() function of a stream (and hides GetVersion/GetType). */
 template <typename Params, typename SubStream>
 class ParamsStream
 {
    static_assert(PlainStreamConcept<SubStream>, "avoid wrapping a ParamsStream in a ParamsStream");
    SubStream& GetSubStream() { return m_substream; }

or similar if you wanted, which would catch this sort of case via the C++20 CI builds. At present the only lines that violate this rule are READWRITE(WithParams(ser_params, AsBase<CService>(obj))); in protocol.h and Derived::SERIALIZE_METHOD_PARAMS in serialize_tests.cpp; you need a GetSubStream getter to fix those up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for reference, GetSubStream was present in the original code, but only to avoid too-deeply nested templates, see https://github.com/bitcoin/bitcoin/pull/19503/files#r455147534

Since it wasn't enforced, I removed it. I'll add your patch once and if there is C++20.

MarcoFalke and others added 3 commits September 5, 2023 10:13
This also cleans up the addrman (de)serialization code paths to only
allow `Disk` serialization. Some unit tests previously forced a
`Network` serialization, which does not make sense, because Bitcoin Core
in production will always `Disk` serialize.
This cleanup idea was suggested by Pieter Wuille and implemented by Anthony
Towns.

Co-authored-by: Pieter Wuille <pieter@wuille.net>
Co-authored-by: Anthony Towns <aj@erisian.com.au>
Co-authored-by: Vasil Dimov <vd@FreeBSD.org>
@maflcko maflcko force-pushed the 2206-split-consensus-net- branch from faba341 to fa626af Compare September 5, 2023 08:19
@maflcko
Copy link
Member Author

maflcko commented Sep 5, 2023

Force pushed the simplification: #25284 (comment)

Copy link
Contributor

@TheCharlatan TheCharlatan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK fa626af

Copy link
Contributor

@ajtowns ajtowns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK fa626af

*/
struct CSerActionSerialize
{
struct ActionSerialize {
constexpr bool ForRead() const { return false; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could be static constexpr bool ForRead() { return ...; }

Actually, ::SerRead etc could be static member functions of these classes rather than being the global namespace too:

//#define READWRITE(...) (::SerReadWriteMany(s, ser_action, __VA_ARGS__))
#define READWRITE(...) (ser_action.SerReadWriteMany(s, __VA_ARGS__))

struct ActionSerialize
{
    static constexpr bool ForRead() { return false; }

    template<typename Stream, typename... Args>
    static void SerReadWriteMany(Stream& s, const Args&... args)
    {
        ::SerializeMany(s, args...);
    }

Maybe something to consider later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jup, will take look later, as this comment is on a line untouched in this pull and thus seems unrelated.

template <typename Params, typename T>
class ParamsWrapper
{
static_assert(std::is_lvalue_reference<T>::value, "ParamsWrapper needs an lvalue reference type T");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not write T& m_object and ParamsWrapper(.., T& obj), ParamsWrapper<.., FooObj> (instead of FooObj&) and drop this assert?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jup, will take a look in one of the follow-ups.

@fanquake fanquake merged commit 8e0d979 into bitcoin:master Sep 7, 2023
@fanquake
Copy link
Member

fanquake commented Sep 7, 2023

Followups / tidy / lint to keep track of:

#25284 (comment)
#25284 (comment)
#25284 (comment)

@ajtowns
Copy link
Contributor

ajtowns commented Sep 7, 2023

Since this code is mostly of the form s << WithParams(V1_DISK, data) we could shorten the common case a bit more:

struct SerParams {
   ...
   template <typename T>
   auto operator()(T& obj) const { return WithParams(*this, obj); }
};

s << V1_DISK(data)

That way when reading the code you're really only seeing the info you care about.

Frank-GER pushed a commit to syscoin/syscoin that referenced this pull request Sep 8, 2023
fanquake added a commit that referenced this pull request Sep 9, 2023
…iskBlockIndex

e73d2a8 refactor: remove clientversion include from dbwrapper.h (Cory Fields)
4240a08 refactor: Use DataStream now that version/type are unused (Cory Fields)
f15f790 Remove version/hashing options from CBlockLocator/CDiskBlockIndex (Cory Fields)

Pull request description:

  This is also a much simpler replacement for #28327.

  There are version fields in `CBlockLocator` and `CDiskBlockIndex` that have always been written but discarded when read.

  I intended to convert them to use SerParams as introduced by #25284, which [ended up looking like this](theuni@3e3af45). However because we don't currently have any definition of what a hash value would mean for either one of those, and we've never assigned the version field any meaning, I think it's better to just not worry about them.

  If we ever need to assign meaning in the future, we can introduce `SerParams` as was done for `CAddress`.

  As for the dummy values chosen:

  `CDiskBlockIndex::DUMMY_VERSION` was easy as the highest ever client version, and I don't expect any objection there.

  `CBlockLocator::DUMMY_VERSION` is hard-coded to the higest _PROTOCOL_ version ever used. This is to avoid a sudden bump that would be visible on the network if CLIENT_VERSION were used instead. In the future, if we ever need to use the value, we can discard anything in the CLIENT_VERSION range (for a few years as needed), as it's quite a bit higher.

  While reviewing, I suggest looking at the throwaway `SerParams` commit above as it shows where the call-sites are. I believe that should be enough to convince one's self that hashing is never used.

ACKs for top commit:
  TheCharlatan:
    Re-ACK e73d2a8
  ajtowns:
    reACK e73d2a8

Tree-SHA512: 45b0dd7c2e918493e2ee92a8e35320ad17991cb8908cb811150a96c5fd584ce177c775baeeb8675a602c90b9ba9203b8cefc0a2a0c6a71078b1d9c2b41e1f3ba
@maflcko maflcko deleted the 2206-split-consensus-net-😻 branch September 11, 2023 11:55
Copy link
Member Author

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

post-merge self-re-review, left some fuzz nits

edit: fixed in #28470

}
m_connman.PushMessage(&node, CNetMsgMaker(node.GetCommonVersion()).Make(make_flags, msg_type, peer.m_addrs_to_send));
m_connman.PushMessage(&node, CNetMsgMaker(node.GetCommonVersion()).Make(msg_type, WithParams(CAddress::SerParams{{ser_enc}, CAddress::Format::Network}, peer.m_addrs_to_send)));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the Make can be written shorter as:

+const bool peer_v2{peer.m_wants_addrv2};
-...Make(msg_type, WithParams(CAddress::SerParams{{ser_enc}, CAddress::Format::Network}, peer.m_addrs_to_send))
+...Make(peer_v2 ? NetMsgType::ADDRV2 : NetMsgType::ADDR, peer_v2 ? CAddress::V2_NETWORK : CAddress::V1_NETWORK, peer.m_addrs_to_send))

and dropping the above if.

if (na.IsAddrV1Compatible()) {
AssertEqualAfterSerializeDeserialize(na);
AssertEqualAfterSerializeDeserialize(na, ConsumeDeserializationParams<CNetAddr::SerParams>(fdp));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This should be V1, because V2 is checked unconditionally below.

if (s.IsAddrV1Compatible()) {
AssertEqualAfterSerializeDeserialize(s);
AssertEqualAfterSerializeDeserialize(s, ConsumeDeserializationParams<CNetAddr::SerParams>(fdp));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same?

FUZZ_TARGET(address_deserialize, .init = initialize_deserialize)
{
FuzzedDataProvider fdp{buffer.data(), buffer.size()};
const auto ser_enc{ConsumeDeserializationParams<CNetAddr::SerParams>(fdp)};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could pick from CAddress::SerParams to cover a wider range?

fanquake added a commit that referenced this pull request Sep 15, 2023
fb6a2ab scripted-diff: use SER_PARAMS_OPFUNC (Anthony Towns)
5e5c8f8 serialize: add SER_PARAMS_OPFUNC (Anthony Towns)
33203f5 serialize: specify type for ParamsWrapper not ref (Anthony Towns)
bf147bf serialize: move ser_action functions out of global namespace (Anthony Towns)

Pull request description:

  Cleanups after #25284:

   * ser_action namespacing - #25284 (comment)
   * make reference implicit - #25284 (comment)
   * function notation - #25284 (comment)

ACKs for top commit:
  MarcoFalke:
    lgtm ACK fb6a2ab 💨
  TheCharlatan:
    ACK fb6a2ab

Tree-SHA512: aacca2ee9cfec360ade6b394606e13d1dfe05bc29c5fbdd48a4e6992bd420312d4ed0d32218d95c560646af326e9977728dc2e759990636298e326947f6f9526
Frank-GER pushed a commit to Frank-GER/syscoin that referenced this pull request Sep 15, 2023
Frank-GER pushed a commit to syscoin/syscoin that referenced this pull request Sep 19, 2023
… and CDiskBlockIndex

e73d2a8 refactor: remove clientversion include from dbwrapper.h (Cory Fields)
4240a08 refactor: Use DataStream now that version/type are unused (Cory Fields)
f15f790 Remove version/hashing options from CBlockLocator/CDiskBlockIndex (Cory Fields)

Pull request description:

  This is also a much simpler replacement for bitcoin#28327.

  There are version fields in `CBlockLocator` and `CDiskBlockIndex` that have always been written but discarded when read.

  I intended to convert them to use SerParams as introduced by bitcoin#25284, which [ended up looking like this](theuni@3e3af45). However because we don't currently have any definition of what a hash value would mean for either one of those, and we've never assigned the version field any meaning, I think it's better to just not worry about them.

  If we ever need to assign meaning in the future, we can introduce `SerParams` as was done for `CAddress`.

  As for the dummy values chosen:

  `CDiskBlockIndex::DUMMY_VERSION` was easy as the highest ever client version, and I don't expect any objection there.

  `CBlockLocator::DUMMY_VERSION` is hard-coded to the higest _PROTOCOL_ version ever used. This is to avoid a sudden bump that would be visible on the network if CLIENT_VERSION were used instead. In the future, if we ever need to use the value, we can discard anything in the CLIENT_VERSION range (for a few years as needed), as it's quite a bit higher.

  While reviewing, I suggest looking at the throwaway `SerParams` commit above as it shows where the call-sites are. I believe that should be enough to convince one's self that hashing is never used.

ACKs for top commit:
  TheCharlatan:
    Re-ACK e73d2a8
  ajtowns:
    reACK e73d2a8

Tree-SHA512: 45b0dd7c2e918493e2ee92a8e35320ad17991cb8908cb811150a96c5fd584ce177c775baeeb8675a602c90b9ba9203b8cefc0a2a0c6a71078b1d9c2b41e1f3ba
fanquake added a commit that referenced this pull request Nov 15, 2023
a0c254c Drop CHashWriter (Anthony Towns)
c94f7e5 Drop OverrideStream (Anthony Towns)
6e9e4e6 Use ParamsWrapper for witness serialization (Anthony Towns)

Pull request description:

  Choose whether witness is included in transaction serialization via serialization parameter rather than the stream version. See #25284 and #19477 for previous context.

ACKs for top commit:
  maflcko:
    re-ACK a0c254c 🐜
  theuni:
    ACK a0c254c

Tree-SHA512: 8fd5cadfd84c5128e36c34a51fb94fdccd956280e7f65b7d73c512d6a9cdb53cdd3649de99ffab5322bd34be26cb95ab4eb05932b3b9de9c11d85743f50dcb13
@bitcoin bitcoin locked and limited conversation to collaborators Feb 8, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Status: Done or Closed or Rethinking
Development

Successfully merging this pull request may close these issues.

10 participants