headerssync: Add headers cache #3

hodlinator · 2025-04-04T14:01:28Z

While we do want to avoid OOM-attacks through CVE-2019-25220, we can still serve the happy path by caching headers up to a reasonable chain height.

Cuts out time/bandwidth re-downloading same headers from same peer - especially good for receiver/sender nodes who are on generally slow/flaky connections. As the chain continues to grow, this becomes more important (even though we scale the timeouts).

Inspired by Optech podcast episode.

~~Queries the machine for free RAM and takes up to 10% of it for the cache. (Which is released when syncing ends).~~

TODO:

Make the number of parallel HeadersSyncState peers more predictable by adding new ones based upon timeouts rather than new block announcements.
Reconsider whether storing the cache on disk would be better than RAM. We don't need random access and we write and read linearly, so it does seem like a good fit. Especially since most are running on SSDs nowadays. Arguments against is assumed code complexity increase, the ephemeral nature of the cache. The cache is a "RAM-sized" thing, albeit big, and growing for every block. Using the disk would bring the HeadersSyncState memory overhead back down again.
If we keep using RAM - Cap usage based off -dbcache/mempool size since headers presync happens before the main IBD phase + mempool activation.
See if there's some way to avoid having all state fields be in-memory at the same time, using std::variant?

l0rinc

Great initiative on optimizing the headers sync – definitely a worthwhile area, I'm very happy somebody's working on this, it was also on my list of things to tackle!
Welcome to the team Benchoors \:D/

The idea of caching headers to skip network redownload is solid, however, the current approach using an in-memory RAM cache with automatic sizing heuristics raises a few key concerns:

Disk caching (raw file or even LevelDB) seems much more scalable, avoids significant RAM pressure during PRESYNC (which could be high), and might offer comparable performance gains given network latency is the main bottleneck. While RAM is faster, disk avoids complex sizing logic and scales better for the future. Have you considered caching in batches to disk instead?
If we keep ram caching, the automatic sizing (GetFreeRAM, time estimates, various bounds) is complex and potentially fragile. Would a simpler approach, like a user-configurable -headerscachesize=MB (with a conservative default), be more predictable and robust?
Some of the logic seems ripe for extraction into a helper method, and some explanatory comments might be better suited for commit messages.
Probably irrelevant, but was wondering that if we're not redownloading, do we still care about "honest nodes" (as long as it's not obvious spam)?

I will review this later in more detail, only had this much energy at the end of the weekend :)

l0rinc · 2025-04-06T19:02:37Z

src/common/system.cpp

@@ -110,3 +120,52 @@ int64_t GetStartupTime()
 {
    return nStartupTime;
 }
+
+std::optional<size_t> GetFreeRAM()


I'm not sure we can get away with this, in other cases the user can provide the size of the desired cache.
If we will keep it, I am very much against introducing dead code, the commit that adds a functionality should also show its use - at least in test, though I prefer real usage to be able to decide if it's the best solution to a given problem or not.

Each commit should be self-contained and its utility immediately demonstrable, the problem should come before the solution, but currently I see the solution in a commit, but not the problem description.

Yep, it is a bit of a departure (interesting potential direction for other settings as well IMO). We mostly do headers-sync on initial startup, after that the memory is freed. Maybe there is some condition where we've been running offline for a while and then reconnect that would re-trigger it too. It might be good to also provide an -arg to override the otherwise dynamic cache size though, will look into that.

I think having an independent commit "lays the breadcrumbs of the story", showing there are only dependencies in one direction (HeadersSyncState -> GetFreeRAM()). But I can see your point too. Merged it into where it is used.

Still not sure why we need this, now that there's a dedicated arg for it. We're not trying to guess dbcache size either - or whether we're on a HDD or SSD (though that would actually be a good idea :p).
We've relied on expert users instead of heuristics mostly, seems simpler to try that here as well - we can estimate it automatically in a follow-up, would go for the simpler solution first.

I think we probably should be "guessing" dbcache size as well. A minor part of this PR is bringing that up for discussion. But moved to a later commit for now, so it's easy to drop if there is more push-back.

I'd do both of them in a separate PR (base memory defaults on available memory), I don't think you want to spend your time defending this part of the code in this PR, seems like a different concern. You'd have to test this on a gazillion operating system and compilers and versions...

src/headerssync.cpp

l0rinc · 2025-04-06T19:19:52Z

src/headerssync.cpp

+                6 * 24 * 365 * 30,
+                // We try to make sure not to use more than 10% of
+                // available RAM for the headers cache.
+                (GetFreeRAM().value_or(0) / 10) / sizeof(decltype(m_headers_cache)::value_type)))))


This may be a bit harsh, but why are we caching to memory, why not cache to disk directly? (via a small buffer, of course, like we do with blocks, see https://github.com/bitcoin/bitcoin/blob/master/src/net_processing.cpp#L114)

We can expect SSDs to be a lot faster by the time this is merged and used - but modern ones are already very close to RAM speed, basically same order of magnitude - and still several orders of magnitude faster than redownloading them:

NVMe has hit almost 4 GB/s in transfer speeds, which is really incredible. This is no where near modern DDR4 speeds… DDR4 is generally around 20 GB/s, with higher speeds hitting 25 GB/s.

RAM latency is lower, but we don't really need that here as far as I understood, we have a predictable read/write order (given some buffering) - very similarly to the actual block download, which also saves and rereads the blocks when the windows is full - and we could likely read/write header batches of 10.000 or whatever, we don't have to store them separately, if my understanding is correct.
Saving and prefetching can even be done on a separate thread if need be (if indeed the iteration order is mostly deterministic).
Predictable read could also be fast on a lousy HDD.

While (de)serialization would add some extra, it's actually quite trivial and we header sync would mimic the pattern used by block downloads (maybe even reuse some of the setup). Could be solved automatically if we saved these to LevelDB instead of separate files - might be slow, dunno, but I think it's worth considering, since I think we can assume that the headers should always fit to disk, but not always into memory.

It may result in simpler code (since we can just assume that everything is cached, no need for 30 year and 10% memory limits and redownload fallbacks), better scaling (we expect more headers in 30 years) and basically the same speed gain compared to redownloading?

My initial goal is using resources that are available on slightly beefier nodes.

Since adding logic for caching to disk adds extra complexity, I'd prefer either doing it in a follow-up PR (saving review-energy for the other aspects in this PR), or possibly in a separate commit at the end of the PR.

Probably would just use one contiguous file without LevelDB since we don't need to do lookups, just write and read in order from genesis/start to sufficient chainwork.
Might be good to have one file for every 100'000 headers in order to be able to delete them after they have been consumed (read), as I think we start downloading and storing blocks to disk in parallel with that process.

Is there some other reason for doing something closer to block storage?

Is there some other reason for doing something closer to block storage?

We don't have to, of course, but I think we need a good reason to do it differently since it's a very similar problem.

Since adding logic for caching to disk adds extra complexity, I'd prefer either doing it in a follow-up PR (saving review-energy for the other aspects in this PR), or possibly in a separate commit at the end of the PR.

I'm not sure that's the case, quite the opposite, I'd expect it to be simpler. But you've spent more time on it than I have...
Memory is finite, not sure it scale with the blockchain. But the disk kinda' has to and I'm not sure it would be measurably slower to do it on disk, but with memory we have to worry about OOM and other attacks all the time.

Do you still think storing to disk is a priority after our latest chats?

Of course, did anything change in that regard (might have forgot about it since).
I still think it would simplify the situation if we could assume that we can store everything.
You don't?

I guess we'll still have the timeout logic before the disk fills:

bitcoin/src/net_processing.cpp

Lines 63 to 66 in d68a4ed

/** Headers download timeout.

* Timeout = base + per_header * (expected number of headers) */

static constexpr auto HEADERS_DOWNLOAD_TIMEOUT_BASE = 15min;

static constexpr auto HEADERS_DOWNLOAD_TIMEOUT_PER_HEADER = 1ms;

bitcoin/src/net_processing.cpp

Lines 5511 to 5517 in d68a4ed

peer->m_headers_sync_timeout = current_time + HEADERS_DOWNLOAD_TIMEOUT_BASE +

(

// Convert HEADERS_DOWNLOAD_TIMEOUT_PER_HEADER to microseconds before scaling

// to maintain precision

std::chrono::microseconds{HEADERS_DOWNLOAD_TIMEOUT_PER_HEADER} *

Ticks<std::chrono::seconds>(NodeClock::now() - m_chainman.m_best_header->Time()) / consensusParams.nPowTargetSpacing

);

bitcoin/src/net_processing.cpp

Lines 5821 to 5841 in d68a4ed

if (current_time > peer->m_headers_sync_timeout && nSyncStarted == 1 && (m_num_preferred_download_peers - state.fPreferredDownload >= 1)) {

// Disconnect a peer (without NetPermissionFlags::NoBan permission) if it is our only sync peer,

// and we have others we could be using instead.

// Note: If all our peers are inbound, then we won't

// disconnect our sync peer for stalling; we have bigger

// problems if we can't get any outbound peers.

if (!pto->HasPermission(NetPermissionFlags::NoBan)) {

LogInfo("Timeout downloading headers, %s\n", pto->DisconnectMsg(fLogIPs));

pto->fDisconnect = true;

return true;

} else {

LogInfo("Timeout downloading headers from noban peer, not %s\n", pto->DisconnectMsg(fLogIPs));

// Reset the headers sync state so that we have a

// chance to try downloading from a different peer.

// Note: this will also result in at least one more

// getheaders message to be sent to

// this peer (eventually).

state.fSyncStarted = false;

nSyncStarted--;

peer->m_headers_sync_timeout = 0us;

}

Hm... we probably should reset the HeadersSyncState when timing out to free memory, especially for noban peers. Will do in upcoming push.

Edit: I'll consider this some more. The cache is very ephemeral, but reducing the amount of limiting logic would be nice.

src/headerssync.cpp

l0rinc · 2025-04-06T19:45:23Z

src/test/headers_sync_chainwork_tests.cpp

-    hss.reset(new HeadersSyncState(0, Params().GetConsensus(), chain_start, chain_work));
-    (void)hss->ProcessNextHeaders(first_chain, true);
-    BOOST_CHECK(hss->GetState() == HeadersSyncState::State::REDOWNLOAD);
+static void HappyPath(const std::vector<CBlockHeader>& first_chain, const CBlockIndex* chain_start)


Is the happy path even likely? What's the smallest thing that can ruin a happy path, a single invalid header that isn't part of the longest chain?
Won't that become a new attack vector - i.e. the lightest attack that will make the happy path unfeasible, derailing caching (which isn't problematic now, may be problematic in a few decades). It's likely a lot better than previous DOS threats, but with disk caching we can likely even handle invalid attempts (e.g. 10% invalid or whatever), which would likely be more robust.

Already on master: During PRESYNC, just as in REDOWNLOAD we both check hashPrevBlock and check PermittedDifficultyTransition (nBits). We must be given headers that fulfill this in-order from our selected headers sync-peer.

The happy path is the most likely. Edge-cases:

Malicious node (SneakyRedownload-test).

Selecting other node that hasn't fully synced the chain either, giving us insufficient work (TooLittleWork-test).

Connection lost.

Don't see potential for a new attack, but maybe I'm missing something?

I still need to understand what the worst case scenario is before and after this change - will try to find out myself (e.g. what kind of attacks are possible before and after)

l0rinc · 2025-04-06T20:04:13Z

src/headerssync.cpp

+                    // buffers.
+                    Assume(m_process_all_remaining_headers);
+                } else {
+                    // If we just switched to REDOWNLOAD and m_headers_cache was


seems to me saving to disk could solve this as well

Not in this case I don't think, even if we were to permit more headers being stored to disk (5min blocktimes?), we would still have a cap on the cache size eventually - so this would still need to be handled.

src/headerssync.cpp

hodlinator

Thank you for your initial feedback!

I'm very happy somebody's working on this, it was also on my list of things to tackle!
Welcome to the team Benchoors :D/

Nice!

the current approach using an in-memory RAM cache with automatic sizing heuristics raises a few key concerns

Disk caching

... automatic sizing (GetFreeRAM, time estimates, various bounds) ... Would a simpler approach, like a user-configurable -headerscachesize=MB

Please see inline comments.

Some of the logic seems ripe for extraction into a helper method, and some explanatory comments might be better suited for commit messages.

Extracted ComputeHeadersCacheMax(), any other specific suggestions?

Probably irrelevant, but was wondering that if we're not redownloading, do we still care about "honest nodes" (as long as it's not obvious spam)?

We have a minimum proof of work threshold which is hardcoded for a given release. This threshold triggers the PRESYNC -> REDOWNLOAD state change. We could have a malicious node attempt to feed us an ultra-long but low work chain. If our node has the cache but is running an old release far in the future (with an old threshold), it will fill the cache and then fall back to the behavior on master of verifying headers are connected and conform to difficulty adjustments, but proceed to throw them away. So the old node is still not going to go into some pathological case unless we remove the cap on the headers cache.

A mid-level aspect I've been considering is doing things like adding an extra state PRESYNC -> +CONSUMECACHE+ -> REDOWNLOAD. But HeadersSyncState exposes state-specific methods publicly (GetState(), GetPresyncHeight(), GetPresyncTime(), GetPresyncWork(), NextHeadersRequestLocator()), so I've avoided it so far. Any ideas on that level?

Changes in latest push:

Extracted ComputeHeadersCacheMax().
Changed HeadersSyncState-ctor to take bytes instead of header count.
Renamed HeadersSyncState::m_download_state -> m_state as we aren't tracking any other state enums within that class.
Extracted ProcessPresync() and ProcessRedownload() methods to reduce indentation.
Added -headerssynccache=MB arg.
Added reference to recent BitMEX blog entry in commit message.
Adjusted comments and other minutia.

hodlinator · 2025-04-09T09:30:23Z

src/common/system.cpp

@@ -110,3 +120,52 @@ int64_t GetStartupTime()
 {
    return nStartupTime;
 }
+
+std::optional<size_t> GetFreeRAM()


Yep, it is a bit of a departure (interesting potential direction for other settings as well IMO). We mostly do headers-sync on initial startup, after that the memory is freed. Maybe there is some condition where we've been running offline for a while and then reconnect that would re-trigger it too. It might be good to also provide an -arg to override the otherwise dynamic cache size though, will look into that.

I think having an independent commit "lays the breadcrumbs of the story", showing there are only dependencies in one direction (HeadersSyncState -> GetFreeRAM()). But I can see your point too. Merged it into where it is used.

src/headerssync.cpp

hodlinator · 2025-04-09T09:42:18Z

src/headerssync.cpp

+                    // buffers.
+                    Assume(m_process_all_remaining_headers);
+                } else {
+                    // If we just switched to REDOWNLOAD and m_headers_cache was


Not in this case I don't think, even if we were to permit more headers being stored to disk (5min blocktimes?), we would still have a cap on the cache size eventually - so this would still need to be handled.

src/headerssync.cpp

hodlinator · 2025-04-10T09:42:35Z

src/headerssync.cpp

+                6 * 24 * 365 * 30,
+                // We try to make sure not to use more than 10% of
+                // available RAM for the headers cache.
+                (GetFreeRAM().value_or(0) / 10) / sizeof(decltype(m_headers_cache)::value_type)))))


My initial goal is using resources that are available on slightly beefier nodes.

Since adding logic for caching to disk adds extra complexity, I'd prefer either doing it in a follow-up PR (saving review-energy for the other aspects in this PR), or possibly in a separate commit at the end of the PR.

Probably would just use one contiguous file without LevelDB since we don't need to do lookups, just write and read in order from genesis/start to sufficient chainwork.
Might be good to have one file for every 100'000 headers in order to be able to delete them after they have been consumed (read), as I think we start downloading and storing blocks to disk in parallel with that process.

Is there some other reason for doing something closer to block storage?

hodlinator · 2025-04-10T09:59:43Z

src/test/headers_sync_chainwork_tests.cpp

-    hss.reset(new HeadersSyncState(0, Params().GetConsensus(), chain_start, chain_work));
-    (void)hss->ProcessNextHeaders(first_chain, true);
-    BOOST_CHECK(hss->GetState() == HeadersSyncState::State::REDOWNLOAD);
+static void HappyPath(const std::vector<CBlockHeader>& first_chain, const CBlockIndex* chain_start)


Already on master: During PRESYNC, just as in REDOWNLOAD we both check hashPrevBlock and check PermittedDifficultyTransition (nBits). We must be given headers that fulfill this in-order from our selected headers sync-peer.

The happy path is the most likely. Edge-cases:

Malicious node (SneakyRedownload-test).

Selecting other node that hasn't fully synced the chain either, giving us insufficient work (TooLittleWork-test).

Connection lost.

Don't see potential for a new attack, but maybe I'm missing something?

hodlinator · 2025-04-11T22:20:31Z

Latest push:

Unbroke p2p_headers_presync fuzz-test in favor of TODO, which is an admission of guilt, there you go. :)
(headerssync: Add headers cache #3 (comment))

src/headerssync.cpp

l0rinc · 2025-04-16T14:20:22Z

src/headerssync.cpp

-                break;
-            }
+    switch (m_state) {
+    case State::PRESYNC:


I will have to debug this properly, I don't understand any of this yet :/

src/headerssync.cpp

l0rinc

I'll debug it properly this week, only did a lightweight sweep again to get you some early feedback. Sorry for the lack of depth, will do that shortly.

l0rinc · 2025-04-16T14:30:31Z

src/test/fuzz/headerssync.cpp

@@ -50,13 +50,11 @@ FUZZ_TARGET(headers_sync_state, .init = initialize_headers_sync_state_fuzz)
 {
    SeedRandomStateForTest(SeedRand::ZEROS);
    FuzzedDataProvider fuzzed_data_provider(buffer.data(), buffer.size());
-    auto mock_time{ConsumeTime(fuzzed_data_provider)};


749f17c

Needs more explanation, I don't have enough context.
"lower" than what? What happens when we're "returning"?

Attempted to improve commit message:

- fuzz: Specify lower limit of mock-time instead of returning - - Should reduce wasted runs, through less code complexity. + fuzz(headerssync): Specify minimum for mock-time + + That way we can always proceed instead of just exiting the fuzz target, wasting cycles. Also reduces code complexity.

Any idea why it was done like that before?

auto age = rand(); if x < 120 do shit else return false

seems indeed simpler to do

auto age = rand(0, 120)

(you could likely commit this in a separate PR as well)

My guess is either the author didn't know of the min parameter or it was added afterwards. I think it's small and non-controversial enough to sneak into this PR for now.

You can do both

It seems it was already available at the time of the pr

My guess if I open a PR just for this is that maflcko will say something like "the fuzz engine should figure out the branch quickly, so it's not an issue".

Probably, but I'd risk it instead of receiving the same comment here :)

src/headerssync.cpp

l0rinc · 2025-04-16T14:34:12Z

src/headerssync.h

@@ -272,7 +272,7 @@ class HeadersSyncState {
    bool m_process_all_remaining_headers{false};

    /** Current state of our headers sync. */
-    State m_download_state{State::PRESYNC};
+    State m_state{State::PRESYNC};


yeah, this is what I mean X x = X:x; // sets X x to x (X)

Yeah, seems kind of naked if removing the comment completely though. Could change it to "Central tracker of state machine's state", but not sure it's enough of an improvement.

It's not just the comment, state is too general, everything is a state.
Can we find a better name - or if not, can we refactor, not being able to find a proper name is a strong code smell

Having a State m_state field in a C++ state machine is certainly not unheard of. Would m_current_state be nicer in your eyes?

No, it doesn't help me in understanding the behavior better. Again, it's like a variable called "thing" ... it indicates a bigger problem

Recap: We discussed on call about seeing if it would be achievable to have different state classes, with each one producing the next as an output, instead of having this enum. I'm still a bit apprehensive about public accessors of HeadersSyncState:

bitcoin/src/headerssync.h

Lines 123 to 133 in f1fdc84

/** Return the current state of our download */

State GetState() const { return m_state; }

/** Return the height reached during the PRESYNC phase */

int64_t GetPresyncHeight() const { return m_presync_height; }

/** Return the block timestamp of the last header received during the PRESYNC phase. */

uint32_t GetPresyncTime() const { return m_presync_last_header_received.nTime; }

/** Return the amount of work in the chain received during the PRESYNC phase. */

arith_uint256 GetPresyncWork() const { return m_presync_chain_work; }

Could maybe lean on std::variant to some extent as it contains an "implicit enum".

Another consideration is that it seems simpler for tests to be able to check current state enum.

Would you feel this to be an improvement in the header over the presync_/redownload_ field-prefixes?

struct { // PRESYNC /** Work that we've seen so far on the peer's chain */ arith_uint256 chain_work; /** Store the latest header received while in PRESYNC (initialized to m_chain_start) */ CBlockHeader last_header_received; /** Height of m_presync_last_header_received */ int64_t height{0}; } m_presync; struct { // REDOWNLOAD /** During phase 2 (REDOWNLOAD), we buffer redownloaded headers in memory * until enough commitments have been verified; those are stored in * m_redownloaded_headers */ std::deque<CompressedHeader> headers; /** Height of last header in m_redownloaded_headers */ int64_t buffer_last_height{0}; /** Hash of last header in m_redownloaded_headers (initialized to * m_chain_start). We have to cache it because we don't have hashPrevBlock * available in a CompressedHeader. */ uint256 buffer_last_hash; /** The hashPrevBlock entry for the first header in m_redownloaded_headers * We need this to reconstruct the full header when it's time for * processing. */ uint256 buffer_first_prev_hash; /** The accumulated work on the redownloaded chain. */ arith_uint256 chain_work; /** Set this to true once we encounter the target blockheader during phase * 2 (REDOWNLOAD). At this point, we can process and store all remaining * headers still in m_redownloaded_headers. */ bool process_all_remaining_headers{false}; } m_redownload;

My feelings don't matter here :)
If this untangles the two scenarios and I don't have to fully understand one when reviewing the other, it's arguably better, isn't it?

Well, I prefer having the indentation/grouping in the header file, which makes it read as more orderly. But in itself doesn't change behavior. Good to hear that you're not vehemently opposed. :)

indentation/grouping in the header file

yes, even different files to avoid the mess we have e.g. in validation.cpp

src/headerssync.cpp

l0rinc · 2025-04-16T14:35:55Z

src/headerssync.cpp

+        ret = ProcessRedownload(received_headers, full_headers_message);
+        break;
+    case State::FINAL:
+        Assume(false); // Should never be called again after entering FINAL.


can we rather avoid this instead? Or at least print something instead of a comment? I hate comments, it's a sign that we've given up :(

Changed assume condition to: Assume(m_state != State::FINAL).

Would love proper support for error strings in asserts/assume.

This is worse than before, m_state != State::FINAL is always true

Inside case State::FINAL, m_state != State::FINAL is always false, triggering failure. This is a common idiom for documenting what condition has failed.

Yeah, but ideally we shouldn't be able to get to an illegal state in the first place - though it's not your code that is at fault here, but maybe we can reduce the invalid possibilities

src/headerssync.h

src/net_processing.cpp

l0rinc · 2025-04-16T14:50:51Z

src/common/system.cpp

40e9538
This is a huge commit, can you break it up meaningfully?

Broke out the RAM-querying to later commit.

I thought about making the cache_bytes parameter to the HeadersSyncState-ctor default to std::nullopt or 0 and then introduce all the setting of that value in subsequent commits, but it felt too contrived, not leaving that commit in an acceptable state. Files with call-sites sending in that value are more than a handful, but they are quite simple.

Let's discuss in person why you think that's an important part of this change

Reached agreement that it would be best for now to share RAM usage with dbcache, and to enable automatic sizing of dbcache/others based upon available RAM, in a separate PR.

I've been sifting through bitcoin#25717 and bitcoin#25720. The behavior we currently have on master is:

Start a HeadersSyncState for a given peer.

If we receive a block inv for a new block, fetch headers and in response trigger an additional HeadersSyncState to be created for that other peer.

Add new HeadersSyncStates for each new block reported.

There are thoughts about switching to instead add another HeadersSyncState every 5-10 minutes instead of triggered by a new block being mined & propagated, which would result in more stability. https://github.com/bitcoin/bitcoin/pull/25720/files#r936707629
I think before sharing memory with dbcache, it might be good to make a separate PR that implements the 5-10 minute interval for spawning of a HeadersSyncState-peer, maybe capping simultaneous instances at 2-3. That way memory usage would be more predictable.

share RAM usage with dbcache

Rather to steal its usage since we shouldn't be using it yet.

I can't follow the rest of the comments yet, can only do a lightweight review now.

src/headerssync.cpp

That way we can always proceed instead of just exiting the fuzz target, wasting cycles. Also reduces code complexity.

* Makes tests logically separated through extraction into functions. Also enables using function-scope instances of HeadersSyncState instead of reset()-ing unique_ptr. * Also adds missing comment for part 4.

l0rinc

Pushing my first impressions while I'm reviewing

src/headerssync.cpp

src/headerssync.h

l0rinc · 2025-04-17T13:28:16Z

src/common/system.cpp

Let's discuss in person why you think that's an important part of this change

l0rinc · 2025-04-17T13:28:50Z

src/headerssync.cpp

+        ret = ProcessRedownload(received_headers, full_headers_message);
+        break;
+    case State::FINAL:
+        Assume(false); // Should never be called again after entering FINAL.


This is worse than before, m_state != State::FINAL is always true

l0rinc · 2025-04-17T13:29:19Z

src/headerssync.h

@@ -272,7 +272,7 @@ class HeadersSyncState {
    bool m_process_all_remaining_headers{false};

    /** Current state of our headers sync. */
-    State m_download_state{State::PRESYNC};
+    State m_state{State::PRESYNC};


It's not just the comment, state is too general, everything is a state.
Can we find a better name - or if not, can we refactor, not being able to find a proper name is a strong code smell

src/headerssync.cpp

src/test/fuzz/p2p_handshake.cpp

src/test/headers_sync_chainwork_tests.cpp

src/headerssync.h

l0rinc · 2025-04-17T13:48:22Z

src/common/system.cpp

@@ -110,3 +120,52 @@ int64_t GetStartupTime()
 {
    return nStartupTime;
 }
+
+std::optional<size_t> GetFreeRAM()


I'd do both of them in a separate PR (base memory defaults on available memory), I don't think you want to spend your time defending this part of the code in this PR, seems like a different concern. You'd have to test this on a gazillion operating system and compilers and versions...

l0rinc · 2025-04-18T09:39:50Z

src/test/fuzz/headerssync.cpp

@@ -50,13 +50,11 @@ FUZZ_TARGET(headers_sync_state, .init = initialize_headers_sync_state_fuzz)
 {
    SeedRandomStateForTest(SeedRand::ZEROS);
    FuzzedDataProvider fuzzed_data_provider(buffer.data(), buffer.size());
-    auto mock_time{ConsumeTime(fuzzed_data_provider)};


It seems it was already available at the time of the pr

src/test/headers_sync_chainwork_tests.cpp

src/headerssync.h

l0rinc · 2025-04-18T10:43:20Z

src/headerssync.h

@@ -183,6 +183,11 @@ class HeadersSyncState {
    const unsigned m_commit_offset;

 private:
+    ProcessingResult ProcessPresync(const std::vector<CBlockHeader>&


maybe this could come after the test breakup (or maybe even the same commit), it seems to follow the same logic

src/node/peerman_args.cpp

src/headerssync.cpp

hodlinator

Minor code update. Mostly updating comment threads.

hodlinator · 2025-05-03T09:17:30Z

src/test/fuzz/headerssync.cpp

@@ -50,13 +50,11 @@ FUZZ_TARGET(headers_sync_state, .init = initialize_headers_sync_state_fuzz)
 {
    SeedRandomStateForTest(SeedRand::ZEROS);
    FuzzedDataProvider fuzzed_data_provider(buffer.data(), buffer.size());
-    auto mock_time{ConsumeTime(fuzzed_data_provider)};


My guess if I open a PR just for this is that maflcko will say something like "the fuzz engine should figure out the branch quickly, so it's not an issue".

hodlinator · 2025-05-03T09:24:19Z

src/headerssync.h

@@ -272,7 +272,7 @@ class HeadersSyncState {
    bool m_process_all_remaining_headers{false};

    /** Current state of our headers sync. */
-    State m_download_state{State::PRESYNC};
+    State m_state{State::PRESYNC};


Recap: We discussed on call about seeing if it would be achievable to have different state classes, with each one producing the next as an output, instead of having this enum. I'm still a bit apprehensive about public accessors of HeadersSyncState:

bitcoin/src/headerssync.h

Lines 123 to 133 in f1fdc84

/** Return the current state of our download */

State GetState() const { return m_state; }

/** Return the height reached during the PRESYNC phase */

int64_t GetPresyncHeight() const { return m_presync_height; }

/** Return the block timestamp of the last header received during the PRESYNC phase. */

uint32_t GetPresyncTime() const { return m_presync_last_header_received.nTime; }

/** Return the amount of work in the chain received during the PRESYNC phase. */

arith_uint256 GetPresyncWork() const { return m_presync_chain_work; }

Could maybe lean on std::variant to some extent as it contains an "implicit enum".

Another consideration is that it seems simpler for tests to be able to check current state enum.

src/headerssync.cpp

hodlinator · 2025-05-03T09:29:26Z

src/common/system.cpp

Reached agreement that it would be best for now to share RAM usage with dbcache, and to enable automatic sizing of dbcache/others based upon available RAM, in a separate PR.

I've been sifting through bitcoin#25717 and bitcoin#25720. The behavior we currently have on master is:

Start a HeadersSyncState for a given peer.

If we receive a block inv for a new block, fetch headers and in response trigger an additional HeadersSyncState to be created for that other peer.

Add new HeadersSyncStates for each new block reported.

There are thoughts about switching to instead add another HeadersSyncState every 5-10 minutes instead of triggered by a new block being mined & propagated, which would result in more stability. https://github.com/bitcoin/bitcoin/pull/25720/files#r936707629
I think before sharing memory with dbcache, it might be good to make a separate PR that implements the 5-10 minute interval for spawning of a HeadersSyncState-peer, maybe capping simultaneous instances at 2-3. That way memory usage would be more predictable.

src/headerssync.cpp

src/test/headers_sync_chainwork_tests.cpp

src/headerssync.cpp

hodlinator · 2025-05-03T11:23:49Z

src/headerssync.cpp

+                6 * 24 * 365 * 30,
+                // We try to make sure not to use more than 10% of
+                // available RAM for the headers cache.
+                (GetFreeRAM().value_or(0) / 10) / sizeof(decltype(m_headers_cache)::value_type)))))


Do you still think storing to disk is a priority after our latest chats?

src/headerssync.h

* Use BOOST_REQUIRE_EQUAL for HeadersSyncState::State - Nicer failure output and prevents continuing the test. * Verify HeadersSyncState::State directly after ProcessNextHeaders(). * CHECK for more, like Locator and result.pow_validated_headers. * Extract and capitalize constants. * Extract genesis block variable.

Grouped fields specific to PRESYNC/REDOWNLOAD into anonymous-type structs. Pre-existing usage such as GetPresyncWork(), GetPresyncTime(), GetPresyncHeight() make this separation logical. * m_download_state -> m_state: See GetState(). We don't have other state-enums in the class, so drop "download_"-qualifier which could mislead one to associate it with the REDOWNLOAD state.

State processing will increase in complexity in upcoming commit.

While we do want to avoid OOM-attacks through [CVE-2024-52916](https://bitcoincore.org/en/2024/07/03/disclose-header-spam/), we can still serve the happy path by caching headers up to a reasonable chain height. The reasonable chain height is calculated by taking the difference between starting block and our current time, and assuming slightly faster than 10 minute block time averages[^1]. Cuts out time/bandwidth re-downloading same headers from same peer - good for both local and remote node, especially over slow/flaky connections. Even more helpful when the local node has flaky connection to all other nodes. The cache is entirely RAM-based since it is ephemeral by nature, taking less than a minute to fill on a really good connection (at contemporary block heights). Also due to the fact that we throw it away upon completion, or if we end up with a peer that has too-low-PoW or feeds us a malicious chain. The headers cache size is set by -headerssynccache=<MiB>. When unset, we default to storing REDOWNLOAD_BUFFER_SIZE number of headers (= +0.68 MiB) (smaller for shorter chains). This makes us able to return a chunk of headers earlier, after having switched into the REDOWNLOAD phase. [^1]: "the Bitcoin target interval time isn’t really 10 minutes, it’s actually 10 minutes and 0.3 seconds. Not that this bug is significant, actually since Bitcoin launched, the average interval has been 9 minutes and 36 seconds, significantly less than 10 minutes. This is because on average since 2009 the hashrate has been increasing." https://blog.bitmex.com/the-timewarp-attack/ (31 Mar 2025)

… in test. Would be good to have TARGET_BLOCKS = REDOWNLOAD_BUFFER_SIZE + 50 or something. So should move REDOWNLOAD_BUFFER_SIZE to header. Need to update references to REDOWNLOAD_BUFFER_SIZE and modifying headerssync.cpp. This should be its own PR.

hodlinator · 2025-05-05T11:25:21Z

Updated PR description with correct CVE and added reference to Optech podcast episode.

jonatack · 2025-05-30T15:39:49Z

Interesting, this might intersect with some issues I've been seeing with slow connections. Will try to review soon.

hodlinator · 2025-05-30T20:52:49Z

@jonatack thanks for checking this out! Actually been curious whether it would help your setup, would be nice to have that suspicion validated.

I'm trying to smuggle more refined versions of the test refactorings in via bitcoin#32579 which is more ready for approach-level review in case that interests you as well.

Using Clang clang version 20.1.6 (Fedora 20.1.6-9.fc43) and: ```bash export CC=clang export CXX=clang++ cmake -B build -DBUILD_GUI=ON -DSANITIZERS=address cmake --build build export LSAN_OPTIONS="suppressions=/root/bitcoin/test/sanitizer_suppressions/lsan" ctest --test-dir build ``` ```bash Totals: 3 passed, 0 failed, 0 skipped, 0 blacklisted, 1589ms ********* Finished testing of AddressBookTests ********* ================================================================= ==21869==ERROR: LeakSanitizer: detected memory leaks Direct leak of 88 byte(s) in 1 object(s) allocated from: #0 0xaaaab5d5af40 in operator new(unsigned long) (/root/bitcoin/build/bin/test_bitcoin-qt+0x39af40) (BuildId: c0e038f1c507ea6860d1cfd499ac54ad83359872) #1 0xffff8c8f56cc in QLayoutPrivate::createWidgetItem(QLayout const*, QWidget*) (/lib64/libQt6Widgets.so.6+0x1a56cc) (BuildId: 8b7b9e470f4d4cd920282a4f963abb01225814fa) #2 0xffff8c8d2f90 in QBoxLayout::insertWidget(int, QWidget*, int, QFlags<Qt::AlignmentFlag>) (/lib64/libQt6Widgets.so.6+0x182f90) (BuildId: 8b7b9e470f4d4cd920282a4f963abb01225814fa) #3 0xaaaab5fc7188 in SendCoinsDialog::addEntry() /root/bitcoin/build/src/qt/./qt/sendcoinsdialog.cpp:596:18 #4 0xaaaab5fc4eec in SendCoinsDialog::SendCoinsDialog(PlatformStyle const*, QWidget*) /root/bitcoin/build/src/qt/./qt/sendcoinsdialog.cpp:84:5 #5 0xaaaab5da67ac in (anonymous namespace)::MiniGUI::MiniGUI(interfaces::Node&, PlatformStyle const*) /root/bitcoin/build/src/qt/test/./qt/test/wallettests.cpp:235:75 #6 0xaaaab5da2000 in (anonymous namespace)::TestGUI(interfaces::Node&, std::shared_ptr<wallet::CWallet> const&) /root/bitcoin/build/src/qt/test/./qt/test/wallettests.cpp:270:13 #7 0xaaaab5d9ebc8 in (anonymous namespace)::TestGUI(interfaces::Node&) /root/bitcoin/build/src/qt/test/./qt/test/wallettests.cpp:453:5 #8 0xaaaab5d9ebc8 in WalletTests::walletTests() /root/bitcoin/build/src/qt/test/./qt/test/wallettests.cpp:475:5 #9 0xffff8b1c5314 in QMetaMethodInvoker::invokeImpl(QMetaMethod, void*, Qt::ConnectionType, long long, void const* const*, char const* const*, QtPrivate::QMetaTypeInterface const* const*) (/lib64/libQt6Core.so.6+0x195314) (BuildId: eacb2d1228362560e5df1a1ce496c99ad61960e7) #10 0xffff8b1c5dc8 in QMetaMethod::invokeImpl(QMetaMethod, void*, Qt::ConnectionType, long long, void const* const*, char const* const*, QtPrivate::QMetaTypeInterface const* const*) (/lib64/libQt6Core.so.6+0x195dc8) (BuildId: eacb2d1228362560e5df1a1ce496c99ad61960e7) #11 0xffff8cf57c54 (/lib64/libQt6Test.so.6+0x27c54) (BuildId: 96bb1cdeead53af0ced36d7970cf9cd79c4c4ccd) #12 0xffff8cf5fa18 (/lib64/libQt6Test.so.6+0x2fa18) (BuildId: 96bb1cdeead53af0ced36d7970cf9cd79c4c4ccd) #13 0xffff8cf6067c (/lib64/libQt6Test.so.6+0x3067c) (BuildId: 96bb1cdeead53af0ced36d7970cf9cd79c4c4ccd) #14 0xffff8cf610a4 (/lib64/libQt6Test.so.6+0x310a4) (BuildId: 96bb1cdeead53af0ced36d7970cf9cd79c4c4ccd) #15 0xffff8cf61aa4 in QTest::qRun() (/lib64/libQt6Test.so.6+0x31aa4) (BuildId: 96bb1cdeead53af0ced36d7970cf9cd79c4c4ccd) #16 0xffff8cf61eb4 in QTest::qExec(QObject*, int, char**) (/lib64/libQt6Test.so.6+0x31eb4) (BuildId: 96bb1cdeead53af0ced36d7970cf9cd79c4c4ccd) #17 0xaaaab5d7d77c in main /root/bitcoin/build/src/qt/test/./qt/test/test_main.cpp:95:30 #18 0xffff8aad6398 in __libc_start_call_main (/lib64/libc.so.6+0x26398) (BuildId: 627f878dd454ee3cc1dfdbd347bb565f1ffb53e7) #19 0xffff8aad6478 in __libc_start_main@GLIBC_2.17 (/lib64/libc.so.6+0x26478) (BuildId: 627f878dd454ee3cc1dfdbd347bb565f1ffb53e7) #20 0xaaaab5c74cac in _start (/root/bitcoin/build/bin/test_bitcoin-qt+0x2b4cac) (BuildId: c0e038f1c507ea6860d1cfd499ac54ad83359872) ``` This happens when building using depends: ```bash Indirect leak of 24 byte(s) in 1 object(s) allocated from: #0 0xaaaabdbe86f8 in malloc (/root/bitcoin/build/bin/test_bitcoin-qt+0x4386f8) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #1 0xfbff97f8c164 (<unknown module>) #2 0xaaaabf0cfaa4 in QDBusConnectionPrivate::QDBusConnectionPrivate() (/root/bitcoin/build/bin/test_bitcoin-qt+0x191faa4) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #3 0xaaaabf0c9e30 in QDBusConnectionManager::doConnectToStandardBus(QDBusConnection::BusType, QString const&, bool) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1919e30) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #4 0xaaaabf0cb0e4 in QtPrivate::QCallableObject<QDBusConnectionPrivate* (QDBusConnectionManager::*)(QDBusConnection::BusType, QString const&, bool), QtPrivate::List<QDBusConnection::BusType&, QString const&, bool&>, QDBusConnectionPrivate*>::impl(int, QtPrivate::QSlotObjectBase*, QObject*, void**, bool*) (/root/bitcoin/build/bin/test_bitcoin-qt+0x191b0e4) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #5 0xaaaabf5cbaf0 in QObject::event(QEvent*) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1e1baf0) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #6 0xaaaabf5a4ce0 in QCoreApplicationPrivate::notify_helper(QObject*, QEvent*) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1df4ce0) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #7 0xaaaabf5a486c in QCoreApplication::notifyInternal2(QObject*, QEvent*) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1df486c) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #8 0xaaaabf5a575c in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1df575c) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #9 0xaaaabf66b858 in QEventDispatcherUNIX::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1ebb858) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #10 0xaaaabf5a9e3c in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1df9e3c) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #11 0xaaaabf632a44 in QThread::exec() (/root/bitcoin/build/bin/test_bitcoin-qt+0x1e82a44) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #12 0xaaaabf0c9bd0 in QDBusConnectionManager::run() (/root/bitcoin/build/bin/test_bitcoin-qt+0x1919bd0) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #13 0xaaaabf669c30 in QThreadPrivate::start(void*) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1eb9c30) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #14 0xaaaabdbe5f2c in asan_thread_start(void*) asan_interceptors.cpp.o #15 0xffff99538608 in thread_start (/lib64/libc.so.6+0xf8608) (BuildId: 627f878dd454ee3cc1dfdbd347bb565f1ffb53e7) SUMMARY: AddressSanitizer: 3592 byte(s) leaked in 37 allocation(s). ```

5be31b2 lsan: add more Qt suppressions (fanquake) Pull request description: Using Clang clang version 20.1.6 (Fedora 20.1.6-9.fc43) and: ```bash export CC=clang export CXX=clang++ cmake -B build -DBUILD_GUI=ON -DSANITIZERS=address cmake --build build export LSAN_OPTIONS="suppressions=/root/bitcoin/test/sanitizer_suppressions/lsan" ctest --test-dir build ``` ```bash Totals: 3 passed, 0 failed, 0 skipped, 0 blacklisted, 1589ms ********* Finished testing of AddressBookTests ********* ================================================================= ==21869==ERROR: LeakSanitizer: detected memory leaks Direct leak of 88 byte(s) in 1 object(s) allocated from: #0 0xaaaab5d5af40 in operator new(unsigned long) (/root/bitcoin/build/bin/test_bitcoin-qt+0x39af40) (BuildId: c0e038f1c507ea6860d1cfd499ac54ad83359872) #1 0xffff8c8f56cc in QLayoutPrivate::createWidgetItem(QLayout const*, QWidget*) (/lib64/libQt6Widgets.so.6+0x1a56cc) (BuildId: 8b7b9e470f4d4cd920282a4f963abb01225814fa) #2 0xffff8c8d2f90 in QBoxLayout::insertWidget(int, QWidget*, int, QFlags<Qt::AlignmentFlag>) (/lib64/libQt6Widgets.so.6+0x182f90) (BuildId: 8b7b9e470f4d4cd920282a4f963abb01225814fa) #3 0xaaaab5fc7188 in SendCoinsDialog::addEntry() /root/bitcoin/build/src/qt/./qt/sendcoinsdialog.cpp:596:18 #4 0xaaaab5fc4eec in SendCoinsDialog::SendCoinsDialog(PlatformStyle const*, QWidget*) /root/bitcoin/build/src/qt/./qt/sendcoinsdialog.cpp:84:5 #5 0xaaaab5da67ac in (anonymous namespace)::MiniGUI::MiniGUI(interfaces::Node&, PlatformStyle const*) /root/bitcoin/build/src/qt/test/./qt/test/wallettests.cpp:235:75 #6 0xaaaab5da2000 in (anonymous namespace)::TestGUI(interfaces::Node&, std::shared_ptr<wallet::CWallet> const&) /root/bitcoin/build/src/qt/test/./qt/test/wallettests.cpp:270:13 #7 0xaaaab5d9ebc8 in (anonymous namespace)::TestGUI(interfaces::Node&) /root/bitcoin/build/src/qt/test/./qt/test/wallettests.cpp:453:5 #8 0xaaaab5d9ebc8 in WalletTests::walletTests() /root/bitcoin/build/src/qt/test/./qt/test/wallettests.cpp:475:5 #9 0xffff8b1c5314 in QMetaMethodInvoker::invokeImpl(QMetaMethod, void*, Qt::ConnectionType, long long, void const* const*, char const* const*, QtPrivate::QMetaTypeInterface const* const*) (/lib64/libQt6Core.so.6+0x195314) (BuildId: eacb2d1228362560e5df1a1ce496c99ad61960e7) #10 0xffff8b1c5dc8 in QMetaMethod::invokeImpl(QMetaMethod, void*, Qt::ConnectionType, long long, void const* const*, char const* const*, QtPrivate::QMetaTypeInterface const* const*) (/lib64/libQt6Core.so.6+0x195dc8) (BuildId: eacb2d1228362560e5df1a1ce496c99ad61960e7) #11 0xffff8cf57c54 (/lib64/libQt6Test.so.6+0x27c54) (BuildId: 96bb1cdeead53af0ced36d7970cf9cd79c4c4ccd) #12 0xffff8cf5fa18 (/lib64/libQt6Test.so.6+0x2fa18) (BuildId: 96bb1cdeead53af0ced36d7970cf9cd79c4c4ccd) #13 0xffff8cf6067c (/lib64/libQt6Test.so.6+0x3067c) (BuildId: 96bb1cdeead53af0ced36d7970cf9cd79c4c4ccd) #14 0xffff8cf610a4 (/lib64/libQt6Test.so.6+0x310a4) (BuildId: 96bb1cdeead53af0ced36d7970cf9cd79c4c4ccd) #15 0xffff8cf61aa4 in QTest::qRun() (/lib64/libQt6Test.so.6+0x31aa4) (BuildId: 96bb1cdeead53af0ced36d7970cf9cd79c4c4ccd) #16 0xffff8cf61eb4 in QTest::qExec(QObject*, int, char**) (/lib64/libQt6Test.so.6+0x31eb4) (BuildId: 96bb1cdeead53af0ced36d7970cf9cd79c4c4ccd) #17 0xaaaab5d7d77c in main /root/bitcoin/build/src/qt/test/./qt/test/test_main.cpp:95:30 #18 0xffff8aad6398 in __libc_start_call_main (/lib64/libc.so.6+0x26398) (BuildId: 627f878dd454ee3cc1dfdbd347bb565f1ffb53e7) #19 0xffff8aad6478 in __libc_start_main@GLIBC_2.17 (/lib64/libc.so.6+0x26478) (BuildId: 627f878dd454ee3cc1dfdbd347bb565f1ffb53e7) #20 0xaaaab5c74cac in _start (/root/bitcoin/build/bin/test_bitcoin-qt+0x2b4cac) (BuildId: c0e038f1c507ea6860d1cfd499ac54ad83359872) ``` This happens when building using depends: ```bash Indirect leak of 24 byte(s) in 1 object(s) allocated from: #0 0xaaaabdbe86f8 in malloc (/root/bitcoin/build/bin/test_bitcoin-qt+0x4386f8) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #1 0xfbff97f8c164 (<unknown module>) #2 0xaaaabf0cfaa4 in QDBusConnectionPrivate::QDBusConnectionPrivate() (/root/bitcoin/build/bin/test_bitcoin-qt+0x191faa4) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #3 0xaaaabf0c9e30 in QDBusConnectionManager::doConnectToStandardBus(QDBusConnection::BusType, QString const&, bool) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1919e30) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #4 0xaaaabf0cb0e4 in QtPrivate::QCallableObject<QDBusConnectionPrivate* (QDBusConnectionManager::*)(QDBusConnection::BusType, QString const&, bool), QtPrivate::List<QDBusConnection::BusType&, QString const&, bool&>, QDBusConnectionPrivate*>::impl(int, QtPrivate::QSlotObjectBase*, QObject*, void**, bool*) (/root/bitcoin/build/bin/test_bitcoin-qt+0x191b0e4) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #5 0xaaaabf5cbaf0 in QObject::event(QEvent*) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1e1baf0) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #6 0xaaaabf5a4ce0 in QCoreApplicationPrivate::notify_helper(QObject*, QEvent*) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1df4ce0) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #7 0xaaaabf5a486c in QCoreApplication::notifyInternal2(QObject*, QEvent*) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1df486c) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #8 0xaaaabf5a575c in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1df575c) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #9 0xaaaabf66b858 in QEventDispatcherUNIX::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1ebb858) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #10 0xaaaabf5a9e3c in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1df9e3c) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #11 0xaaaabf632a44 in QThread::exec() (/root/bitcoin/build/bin/test_bitcoin-qt+0x1e82a44) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #12 0xaaaabf0c9bd0 in QDBusConnectionManager::run() (/root/bitcoin/build/bin/test_bitcoin-qt+0x1919bd0) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #13 0xaaaabf669c30 in QThreadPrivate::start(void*) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1eb9c30) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #14 0xaaaabdbe5f2c in asan_thread_start(void*) asan_interceptors.cpp.o #15 0xffff99538608 in thread_start (/lib64/libc.so.6+0xf8608) (BuildId: 627f878dd454ee3cc1dfdbd347bb565f1ffb53e7) SUMMARY: AddressSanitizer: 3592 byte(s) leaked in 37 allocation(s). ``` ACKs for top commit: maflcko: lgtm ACK 5be31b2 Tree-SHA512: 0c33661c7ec83ea9b874c1ee4ee2de513131690287363e216a88560dfb31a59ef563a50af756c86a991583aa64a600a74e20fd5d6a104cf4c0a27532de8d2211

…xec in RunCommandJSON" faa1c3e Revert "Merge bitcoin#32343: common: Close non-std fds before exec in RunCommandJSON" (MarcoFalke) Pull request description: After a fork() in a multithreaded program, the child can safely call only async-signal-safe functions (see [signal-safety(7)](https://www.man7.org/linux/man-pages/man7/signal-safety.7.html)) until such time as it calls execv. The standard library (`std` namespace) is not async-signal-safe. Also, `throw`, isn't. There was an alternative implementation using `readdir` (bitcoin#32529), but that isn't async-signal-safe either, and that implementation was still using `throw`. So temporarily revert this feature. A follow-up in the future can add it back, using only async-signal-safe functions, or by using a different approach. Fixes bitcoin#32524 Fixes bitcoin#33015 Fixes bitcoin#32855 For reference, a failure can manifest in the GCC debug mode: * While `fork`ing, a debug mode mutex is held (by any other thread). * The `fork`ed child tries to use the stdard libary before `execv` and deadlocks. This may look like the following: ``` (gdb) thread apply all bt Thread 1 (Thread 0xf58f4b40 (LWP 774911) "b-httpworker.2"): #0 0xf7f4f589 in __kernel_vsyscall () #1 0xf79e467e in ?? () from /lib32/libc.so.6 #2 0xf79eb582 in pthread_mutex_lock () from /lib32/libc.so.6 #3 0xf7d93bf2 in ?? () from /lib32/libstdc++.so.6 #4 0xf7d93f36 in __gnu_debug::_Safe_iterator_base::_M_attach(__gnu_debug::_Safe_sequence_base*, bool) () from /lib32/libstdc++.so.6 #5 0x5668810a in __gnu_debug::_Safe_iterator_base::_Safe_iterator_base (this=0xf58f13ac, __seq=0xf58f13f8, __constant=false) at /bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/debug/safe_base.h:91 #6 0x56ddfb50 in __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<int*, std::__cxx1998::vector<int, std::allocator<int> > >, std::__debug::vector<int, std::allocator<int> >, std::forward_iterator_tag>::_Safe_iterator (this=0xf58f13a8, __i=3, __seq=0xf58f13f8) at /bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/debug/safe_iterator.h:162 #7 0x56ddfacb in __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<int*, std::__cxx1998::vector<int, std::allocator<int> > >, std::__debug::vector<int, std::allocator<int> >, std::bidirectional_iterator_tag>::_Safe_iterator (this=0xf58f13a8, __i=3, __seq=0xf58f13f8) at /bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/debug/safe_iterator.h:539 #8 0x56ddfa5b in __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<int*, std::__cxx1998::vector<int, std::allocator<int> > >, std::__debug::vector<int, std::allocator<int> >, std::random_access_iterator_tag>::_Safe_iterator (this=0xf58f13a8, __i=3, __seq=0xf58f13f8) at /bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/debug/safe_iterator.h:687 #9 0x56ddd3f6 in std::__debug::vector<int, std::allocator<int> >::begin (this=0xf58f13f8) at /bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/debug/vector:300 #10 0x57d83701 in subprocess::detail::Child::execute_child (this=0xf58f156c) at ./util/subprocess.h:1372 #11 0x57d80a7c in subprocess::Popen::execute_process (this=0xf58f1cd8) at ./util/subprocess.h:1231 #12 0x57d6d2b4 in subprocess::Popen::Popen<subprocess::input, subprocess::output, subprocess::error, subprocess::close_fds> (this=0xf58f1cd8, cmd_args="fake.py enumerate", args=..., args=..., args=..., args=...) at ./util/subprocess.h:964 #13 0x57d6b597 in RunCommandParseJSON (str_command="fake.py enumerate", str_std_in="") at ./common/run_command.cpp:27 #14 0x57a90547 in ExternalSigner::Enumerate (command="fake.py", signers=std::__debug::vector of length 0, capacity 0, chain="regtest") at ./external_signer.cpp:28 #15 0x56defdab in enumeratesigners()::$_0::operator()(RPCHelpMan const&, JSONRPCRequest const&) const (this=0xf58f2ba0, self=..., request=...) at ./rpc/external_signer.cpp:51 ... (truncated, only one thread exists) ``` ACKs for top commit: fanquake: ACK faa1c3e darosior: ACK faa1c3e Tree-SHA512: 602da5f2eba08d7fe01ba19baf411e287ae27fe2d4b82f41734e05b7b1d938ce94cc0041e86ba677284fa92838e96ebee687023ff28047e2b036fd9a53567e0a

hodlinator marked this pull request as draft April 4, 2025 14:06

l0rinc reviewed Apr 6, 2025

View reviewed changes

hodlinator force-pushed the 2024/10/headers_presync_cache branch from ef1b61b to 2de1ae5 Compare April 11, 2025 21:19

hodlinator commented Apr 11, 2025

View reviewed changes

hodlinator force-pushed the 2024/10/headers_presync_cache branch from 2de1ae5 to c0c990f Compare April 11, 2025 22:17

l0rinc reviewed Apr 16, 2025

View reviewed changes

src/headerssync.cpp Show resolved Hide resolved

l0rinc reviewed Apr 16, 2025

View reviewed changes

src/headerssync.cpp Show resolved Hide resolved

l0rinc reviewed Apr 16, 2025

View reviewed changes

src/headerssync.cpp Outdated Show resolved Hide resolved

l0rinc reviewed Apr 16, 2025

View reviewed changes

src/headerssync.cpp Outdated Show resolved Hide resolved

l0rinc reviewed Apr 16, 2025

View reviewed changes

hodlinator added 2 commits April 17, 2025 11:26

fuzz(headerssync): Specify minimum for mock-time

93c327f

That way we can always proceed instead of just exiting the fuzz target, wasting cycles. Also reduces code complexity.

refactor(test): Break up headers_sync_chainwork_tests

359ebcd

* Makes tests logically separated through extraction into functions. Also enables using function-scope instances of HeadersSyncState instead of reset()-ing unique_ptr. * Also adds missing comment for part 4.

hodlinator force-pushed the 2024/10/headers_presync_cache branch from c0c990f to f1fdc84 Compare April 17, 2025 11:37

l0rinc reviewed Apr 17, 2025

View reviewed changes

l0rinc reviewed Apr 18, 2025

View reviewed changes

hodlinator force-pushed the 2024/10/headers_presync_cache branch from f1fdc84 to 67cd87b Compare May 3, 2025 12:31

hodlinator commented May 3, 2025

View reviewed changes

hodlinator added 6 commits May 3, 2025 23:40

refactor(headerssync): Break up ProcessNextHeaders

52bc71e

State processing will increase in complexity in upcoming commit.

test: Parametrize based off cache size.

e6bafca

hodlinator force-pushed the 2024/10/headers_presync_cache branch from 67cd87b to d68a4ed Compare May 3, 2025 21:54

hodlinator mentioned this pull request Aug 21, 2025

headerssync: Correct unrealistic unit test behavior bitcoin/bitcoin#32579

Open

	/** Headers download timeout.
	* Timeout = base + per_header * (expected number of headers) */
	static constexpr auto HEADERS_DOWNLOAD_TIMEOUT_BASE = 15min;
	static constexpr auto HEADERS_DOWNLOAD_TIMEOUT_PER_HEADER = 1ms;

	peer->m_headers_sync_timeout = current_time + HEADERS_DOWNLOAD_TIMEOUT_BASE +
	(
	// Convert HEADERS_DOWNLOAD_TIMEOUT_PER_HEADER to microseconds before scaling
	// to maintain precision
	std::chrono::microseconds{HEADERS_DOWNLOAD_TIMEOUT_PER_HEADER} *
	Ticks<std::chrono::seconds>(NodeClock::now() - m_chainman.m_best_header->Time()) / consensusParams.nPowTargetSpacing
	);

	if (current_time > peer->m_headers_sync_timeout && nSyncStarted == 1 && (m_num_preferred_download_peers - state.fPreferredDownload >= 1)) {
	// Disconnect a peer (without NetPermissionFlags::NoBan permission) if it is our only sync peer,
	// and we have others we could be using instead.
	// Note: If all our peers are inbound, then we won't
	// disconnect our sync peer for stalling; we have bigger
	// problems if we can't get any outbound peers.
	if (!pto->HasPermission(NetPermissionFlags::NoBan)) {
	LogInfo("Timeout downloading headers, %s\n", pto->DisconnectMsg(fLogIPs));
	pto->fDisconnect = true;
	return true;
	} else {
	LogInfo("Timeout downloading headers from noban peer, not %s\n", pto->DisconnectMsg(fLogIPs));
	// Reset the headers sync state so that we have a
	// chance to try downloading from a different peer.
	// Note: this will also result in at least one more
	// getheaders message to be sent to
	// this peer (eventually).
	state.fSyncStarted = false;
	nSyncStarted--;
	peer->m_headers_sync_timeout = 0us;
	}

	/** Return the current state of our download */
	State GetState() const { return m_state; }

	/** Return the height reached during the PRESYNC phase */
	int64_t GetPresyncHeight() const { return m_presync_height; }

	/** Return the block timestamp of the last header received during the PRESYNC phase. */
	uint32_t GetPresyncTime() const { return m_presync_last_header_received.nTime; }

	/** Return the amount of work in the chain received during the PRESYNC phase. */
	arith_uint256 GetPresyncWork() const { return m_presync_chain_work; }

headerssync: Add headers cache #3

Are you sure you want to change the base?

headerssync: Add headers cache #3

Uh oh!

Conversation

hodlinator commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

l0rinc left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hodlinator Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

l0rinc May 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hodlinator May 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hodlinator left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hodlinator commented Apr 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

hodlinator commented Apr 4, 2025 •

edited

Loading

l0rinc left a comment •

edited

Loading

hodlinator Apr 17, 2025 •

edited

Loading

l0rinc May 3, 2025 •

edited

Loading

hodlinator May 3, 2025 •

edited

Loading

hodlinator left a comment •

edited

Loading