-
Notifications
You must be signed in to change notification settings - Fork 0
headerssync: Add headers cache #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
headerssync: Add headers cache #3
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great initiative on optimizing the headers sync – definitely a worthwhile area, I'm very happy somebody's working on this, it was also on my list of things to tackle!
Welcome to the team Benchoors \:D/
The idea of caching headers to skip network redownload is solid, however, the current approach using an in-memory RAM cache with automatic sizing heuristics raises a few key concerns:
- Disk caching (raw file or even LevelDB) seems much more scalable, avoids significant RAM pressure during PRESYNC (which could be high), and might offer comparable performance gains given network latency is the main bottleneck. While RAM is faster, disk avoids complex sizing logic and scales better for the future. Have you considered caching in batches to disk instead?
- If we keep ram caching, the automatic sizing (GetFreeRAM, time estimates, various bounds) is complex and potentially fragile. Would a simpler approach, like a user-configurable
-headerscachesize=MB
(with a conservative default), be more predictable and robust? - Some of the logic seems ripe for extraction into a helper method, and some explanatory comments might be better suited for commit messages.
- Probably irrelevant, but was wondering that if we're not redownloading, do we still care about "honest nodes" (as long as it's not obvious spam)?
I will review this later in more detail, only had this much energy at the end of the weekend :)
src/common/system.cpp
Outdated
@@ -110,3 +120,52 @@ int64_t GetStartupTime() | |||
{ | |||
return nStartupTime; | |||
} | |||
|
|||
std::optional<size_t> GetFreeRAM() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we can get away with this, in other cases the user can provide the size of the desired cache.
If we will keep it, I am very much against introducing dead code, the commit that adds a functionality should also show its use - at least in test, though I prefer real usage to be able to decide if it's the best solution to a given problem or not.
Each commit should be self-contained and its utility immediately demonstrable, the problem should come before the solution, but currently I see the solution in a commit, but not the problem description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it is a bit of a departure (interesting potential direction for other settings as well IMO). We mostly do headers-sync on initial startup, after that the memory is freed. Maybe there is some condition where we've been running offline for a while and then reconnect that would re-trigger it too. It might be good to also provide an -arg to override the otherwise dynamic cache size though, will look into that.
I think having an independent commit "lays the breadcrumbs of the story", showing there are only dependencies in one direction (HeadersSyncState
-> GetFreeRAM()
). But I can see your point too. Merged it into where it is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not sure why we need this, now that there's a dedicated arg for it. We're not trying to guess dbcache
size either - or whether we're on a HDD or SSD (though that would actually be a good idea :p).
We've relied on expert users instead of heuristics mostly, seems simpler to try that here as well - we can estimate it automatically in a follow-up, would go for the simpler solution first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we probably should be "guessing" dbcache
size as well. A minor part of this PR is bringing that up for discussion. But moved to a later commit for now, so it's easy to drop if there is more push-back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd do both of them in a separate PR (base memory defaults on available memory), I don't think you want to spend your time defending this part of the code in this PR, seems like a different concern. You'd have to test this on a gazillion operating system and compilers and versions...
src/headerssync.cpp
Outdated
6 * 24 * 365 * 30, | ||
// We try to make sure not to use more than 10% of | ||
// available RAM for the headers cache. | ||
(GetFreeRAM().value_or(0) / 10) / sizeof(decltype(m_headers_cache)::value_type))))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be a bit harsh, but why are we caching to memory, why not cache to disk directly? (via a small buffer, of course, like we do with blocks, see https://github.com/bitcoin/bitcoin/blob/master/src/net_processing.cpp#L114)
We can expect SSDs to be a lot faster by the time this is merged and used - but modern ones are already very close to RAM speed, basically same order of magnitude - and still several orders of magnitude faster than redownloading them:
NVMe has hit almost 4 GB/s in transfer speeds, which is really incredible. This is no where near modern DDR4 speeds… DDR4 is generally around 20 GB/s, with higher speeds hitting 25 GB/s.
RAM latency is lower, but we don't really need that here as far as I understood, we have a predictable read/write order (given some buffering) - very similarly to the actual block download, which also saves and rereads the blocks when the windows is full - and we could likely read/write header batches of 10.000 or whatever, we don't have to store them separately, if my understanding is correct.
Saving and prefetching can even be done on a separate thread if need be (if indeed the iteration order is mostly deterministic).
Predictable read could also be fast on a lousy HDD.
While (de)serialization would add some extra, it's actually quite trivial and we header sync would mimic the pattern used by block downloads (maybe even reuse some of the setup). Could be solved automatically if we saved these to LevelDB instead of separate files - might be slow, dunno, but I think it's worth considering, since I think we can assume that the headers should always fit to disk, but not always into memory.
It may result in simpler code (since we can just assume that everything is cached, no need for 30 year and 10% memory limits and redownload fallbacks), better scaling (we expect more headers in 30 years) and basically the same speed gain compared to redownloading?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial goal is using resources that are available on slightly beefier nodes.
Since adding logic for caching to disk adds extra complexity, I'd prefer either doing it in a follow-up PR (saving review-energy for the other aspects in this PR), or possibly in a separate commit at the end of the PR.
Probably would just use one contiguous file without LevelDB since we don't need to do lookups, just write and read in order from genesis/start to sufficient chainwork.
Might be good to have one file for every 100'000 headers in order to be able to delete them after they have been consumed (read), as I think we start downloading and storing blocks to disk in parallel with that process.
Is there some other reason for doing something closer to block storage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some other reason for doing something closer to block storage?
We don't have to, of course, but I think we need a good reason to do it differently since it's a very similar problem.
Since adding logic for caching to disk adds extra complexity, I'd prefer either doing it in a follow-up PR (saving review-energy for the other aspects in this PR), or possibly in a separate commit at the end of the PR.
I'm not sure that's the case, quite the opposite, I'd expect it to be simpler. But you've spent more time on it than I have...
Memory is finite, not sure it scale with the blockchain. But the disk kinda' has to and I'm not sure it would be measurably slower to do it on disk, but with memory we have to worry about OOM and other attacks all the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still think storing to disk is a priority after our latest chats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, did anything change in that regard (might have forgot about it since).
I still think it would simplify the situation if we could assume that we can store everything.
You don't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we'll still have the timeout logic before the disk fills:
bitcoin/src/net_processing.cpp
Lines 63 to 66 in d68a4ed
/** Headers download timeout. | |
* Timeout = base + per_header * (expected number of headers) */ | |
static constexpr auto HEADERS_DOWNLOAD_TIMEOUT_BASE = 15min; | |
static constexpr auto HEADERS_DOWNLOAD_TIMEOUT_PER_HEADER = 1ms; |
bitcoin/src/net_processing.cpp
Lines 5511 to 5517 in d68a4ed
peer->m_headers_sync_timeout = current_time + HEADERS_DOWNLOAD_TIMEOUT_BASE + | |
( | |
// Convert HEADERS_DOWNLOAD_TIMEOUT_PER_HEADER to microseconds before scaling | |
// to maintain precision | |
std::chrono::microseconds{HEADERS_DOWNLOAD_TIMEOUT_PER_HEADER} * | |
Ticks<std::chrono::seconds>(NodeClock::now() - m_chainman.m_best_header->Time()) / consensusParams.nPowTargetSpacing | |
); |
bitcoin/src/net_processing.cpp
Lines 5821 to 5841 in d68a4ed
if (current_time > peer->m_headers_sync_timeout && nSyncStarted == 1 && (m_num_preferred_download_peers - state.fPreferredDownload >= 1)) { | |
// Disconnect a peer (without NetPermissionFlags::NoBan permission) if it is our only sync peer, | |
// and we have others we could be using instead. | |
// Note: If all our peers are inbound, then we won't | |
// disconnect our sync peer for stalling; we have bigger | |
// problems if we can't get any outbound peers. | |
if (!pto->HasPermission(NetPermissionFlags::NoBan)) { | |
LogInfo("Timeout downloading headers, %s\n", pto->DisconnectMsg(fLogIPs)); | |
pto->fDisconnect = true; | |
return true; | |
} else { | |
LogInfo("Timeout downloading headers from noban peer, not %s\n", pto->DisconnectMsg(fLogIPs)); | |
// Reset the headers sync state so that we have a | |
// chance to try downloading from a different peer. | |
// Note: this will also result in at least one more | |
// getheaders message to be sent to | |
// this peer (eventually). | |
state.fSyncStarted = false; | |
nSyncStarted--; | |
peer->m_headers_sync_timeout = 0us; | |
} |
Hm... we probably should reset the HeadersSyncState
when timing out to free memory, especially for noban peers. Will do in upcoming push.
Edit: I'll consider this some more. The cache is very ephemeral, but reducing the amount of limiting logic would be nice.
hss.reset(new HeadersSyncState(0, Params().GetConsensus(), chain_start, chain_work)); | ||
(void)hss->ProcessNextHeaders(first_chain, true); | ||
BOOST_CHECK(hss->GetState() == HeadersSyncState::State::REDOWNLOAD); | ||
static void HappyPath(const std::vector<CBlockHeader>& first_chain, const CBlockIndex* chain_start) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the happy path even likely? What's the smallest thing that can ruin a happy path, a single invalid header that isn't part of the longest chain?
Won't that become a new attack vector - i.e. the lightest attack that will make the happy path unfeasible, derailing caching (which isn't problematic now, may be problematic in a few decades). It's likely a lot better than previous DOS threats, but with disk caching we can likely even handle invalid attempts (e.g. 10% invalid or whatever), which would likely be more robust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already on master: During PRESYNC, just as in REDOWNLOAD we both check hashPrevBlock
and check PermittedDifficultyTransition
(nBits
). We must be given headers that fulfill this in-order from our selected headers sync-peer.
The happy path is the most likely. Edge-cases:
- Malicious node (
SneakyRedownload
-test). - Selecting other node that hasn't fully synced the chain either, giving us insufficient work (
TooLittleWork
-test). - Connection lost.
Don't see potential for a new attack, but maybe I'm missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to understand what the worst case scenario is before and after this change - will try to find out myself (e.g. what kind of attacks are possible before and after)
src/headerssync.cpp
Outdated
// buffers. | ||
Assume(m_process_all_remaining_headers); | ||
} else { | ||
// If we just switched to REDOWNLOAD and m_headers_cache was |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems to me saving to disk could solve this as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in this case I don't think, even if we were to permit more headers being stored to disk (5min blocktimes?), we would still have a cap on the cache size eventually - so this would still need to be handled.
ef1b61b
to
2de1ae5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your initial feedback!
I'm very happy somebody's working on this, it was also on my list of things to tackle!
Welcome to the team Benchoors :D/
Nice!
the current approach using an in-memory RAM cache with automatic sizing heuristics raises a few key concerns
- Disk caching
- ... automatic sizing (GetFreeRAM, time estimates, various bounds) ... Would a simpler approach, like a user-configurable
-headerscachesize=MB
Please see inline comments.
- Some of the logic seems ripe for extraction into a helper method, and some explanatory comments might be better suited for commit messages.
Extracted ComputeHeadersCacheMax()
, any other specific suggestions?
Probably irrelevant, but was wondering that if we're not redownloading, do we still care about "honest nodes" (as long as it's not obvious spam)?
We have a minimum proof of work threshold which is hardcoded for a given release. This threshold triggers the PRESYNC -> REDOWNLOAD state change. We could have a malicious node attempt to feed us an ultra-long but low work chain. If our node has the cache but is running an old release far in the future (with an old threshold), it will fill the cache and then fall back to the behavior on master of verifying headers are connected and conform to difficulty adjustments, but proceed to throw them away. So the old node is still not going to go into some pathological case unless we remove the cap on the headers cache.
A mid-level aspect I've been considering is doing things like adding an extra state PRESYNC -> +CONSUMECACHE+ -> REDOWNLOAD
. But HeadersSyncState
exposes state-specific methods publicly (GetState()
, GetPresyncHeight()
, GetPresyncTime()
, GetPresyncWork()
, NextHeadersRequestLocator()
), so I've avoided it so far. Any ideas on that level?
Changes in latest push:
- Extracted
ComputeHeadersCacheMax()
. - Changed
HeadersSyncState
-ctor to take bytes instead of header count. - Renamed
HeadersSyncState::m_download_state
->m_state
as we aren't tracking any other state enums within that class. - Extracted
ProcessPresync()
andProcessRedownload()
methods to reduce indentation. - Added
-headerssynccache=MB
arg. - Added reference to recent BitMEX blog entry in commit message.
- Adjusted comments and other minutia.
src/common/system.cpp
Outdated
@@ -110,3 +120,52 @@ int64_t GetStartupTime() | |||
{ | |||
return nStartupTime; | |||
} | |||
|
|||
std::optional<size_t> GetFreeRAM() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it is a bit of a departure (interesting potential direction for other settings as well IMO). We mostly do headers-sync on initial startup, after that the memory is freed. Maybe there is some condition where we've been running offline for a while and then reconnect that would re-trigger it too. It might be good to also provide an -arg to override the otherwise dynamic cache size though, will look into that.
I think having an independent commit "lays the breadcrumbs of the story", showing there are only dependencies in one direction (HeadersSyncState
-> GetFreeRAM()
). But I can see your point too. Merged it into where it is used.
src/headerssync.cpp
Outdated
// buffers. | ||
Assume(m_process_all_remaining_headers); | ||
} else { | ||
// If we just switched to REDOWNLOAD and m_headers_cache was |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in this case I don't think, even if we were to permit more headers being stored to disk (5min blocktimes?), we would still have a cap on the cache size eventually - so this would still need to be handled.
src/headerssync.cpp
Outdated
6 * 24 * 365 * 30, | ||
// We try to make sure not to use more than 10% of | ||
// available RAM for the headers cache. | ||
(GetFreeRAM().value_or(0) / 10) / sizeof(decltype(m_headers_cache)::value_type))))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial goal is using resources that are available on slightly beefier nodes.
Since adding logic for caching to disk adds extra complexity, I'd prefer either doing it in a follow-up PR (saving review-energy for the other aspects in this PR), or possibly in a separate commit at the end of the PR.
Probably would just use one contiguous file without LevelDB since we don't need to do lookups, just write and read in order from genesis/start to sufficient chainwork.
Might be good to have one file for every 100'000 headers in order to be able to delete them after they have been consumed (read), as I think we start downloading and storing blocks to disk in parallel with that process.
Is there some other reason for doing something closer to block storage?
hss.reset(new HeadersSyncState(0, Params().GetConsensus(), chain_start, chain_work)); | ||
(void)hss->ProcessNextHeaders(first_chain, true); | ||
BOOST_CHECK(hss->GetState() == HeadersSyncState::State::REDOWNLOAD); | ||
static void HappyPath(const std::vector<CBlockHeader>& first_chain, const CBlockIndex* chain_start) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already on master: During PRESYNC, just as in REDOWNLOAD we both check hashPrevBlock
and check PermittedDifficultyTransition
(nBits
). We must be given headers that fulfill this in-order from our selected headers sync-peer.
The happy path is the most likely. Edge-cases:
- Malicious node (
SneakyRedownload
-test). - Selecting other node that hasn't fully synced the chain either, giving us insufficient work (
TooLittleWork
-test). - Connection lost.
Don't see potential for a new attack, but maybe I'm missing something?
2de1ae5
to
c0c990f
Compare
Latest push:
|
break; | ||
} | ||
switch (m_state) { | ||
case State::PRESYNC: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will have to debug this properly, I don't understand any of this yet :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll debug it properly this week, only did a lightweight sweep again to get you some early feedback. Sorry for the lack of depth, will do that shortly.
@@ -50,13 +50,11 @@ FUZZ_TARGET(headers_sync_state, .init = initialize_headers_sync_state_fuzz) | |||
{ | |||
SeedRandomStateForTest(SeedRand::ZEROS); | |||
FuzzedDataProvider fuzzed_data_provider(buffer.data(), buffer.size()); | |||
auto mock_time{ConsumeTime(fuzzed_data_provider)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs more explanation, I don't have enough context.
"lower" than what? What happens when we're "returning"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attempted to improve commit message:
- fuzz: Specify lower limit of mock-time instead of returning
-
- Should reduce wasted runs, through less code complexity.
+ fuzz(headerssync): Specify minimum for mock-time
+
+ That way we can always proceed instead of just exiting the fuzz target, wasting cycles. Also reduces code complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea why it was done like that before?
auto age = rand();
if x < 120 do shit
else return false
seems indeed simpler to do
auto age = rand(0, 120)
(you could likely commit this in a separate PR as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess is either the author didn't know of the min
parameter or it was added afterwards. I think it's small and non-controversial enough to sneak into this PR for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems it was already available at the time of the pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess if I open a PR just for this is that maflcko will say something like "the fuzz engine should figure out the branch quickly, so it's not an issue".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, but I'd risk it instead of receiving the same comment here :)
@@ -272,7 +272,7 @@ class HeadersSyncState { | |||
bool m_process_all_remaining_headers{false}; | |||
|
|||
/** Current state of our headers sync. */ | |||
State m_download_state{State::PRESYNC}; | |||
State m_state{State::PRESYNC}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this is what I mean X x = X:x; // sets X x to x (X)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, seems kind of naked if removing the comment completely though. Could change it to "Central tracker of state machine's state", but not sure it's enough of an improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just the comment, state
is too general, everything is a state
.
Can we find a better name - or if not, can we refactor, not being able to find a proper name is a strong code smell
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a State m_state
field in a C++ state machine is certainly not unheard of. Would m_current_state
be nicer in your eyes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it doesn't help me in understanding the behavior better. Again, it's like a variable called "thing" ... it indicates a bigger problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recap: We discussed on call about seeing if it would be achievable to have different state classes, with each one producing the next as an output, instead of having this enum
. I'm still a bit apprehensive about public accessors of HeadersSyncState
:
Lines 123 to 133 in f1fdc84
/** Return the current state of our download */ | |
State GetState() const { return m_state; } | |
/** Return the height reached during the PRESYNC phase */ | |
int64_t GetPresyncHeight() const { return m_presync_height; } | |
/** Return the block timestamp of the last header received during the PRESYNC phase. */ | |
uint32_t GetPresyncTime() const { return m_presync_last_header_received.nTime; } | |
/** Return the amount of work in the chain received during the PRESYNC phase. */ | |
arith_uint256 GetPresyncWork() const { return m_presync_chain_work; } |
Could maybe lean on std::variant
to some extent as it contains an "implicit enum".
Another consideration is that it seems simpler for tests to be able to check current state enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you feel this to be an improvement in the header over the presync_
/redownload_
field-prefixes?
struct { // PRESYNC
/** Work that we've seen so far on the peer's chain */
arith_uint256 chain_work;
/** Store the latest header received while in PRESYNC (initialized to m_chain_start) */
CBlockHeader last_header_received;
/** Height of m_presync_last_header_received */
int64_t height{0};
} m_presync;
struct { // REDOWNLOAD
/** During phase 2 (REDOWNLOAD), we buffer redownloaded headers in memory
* until enough commitments have been verified; those are stored in
* m_redownloaded_headers */
std::deque<CompressedHeader> headers;
/** Height of last header in m_redownloaded_headers */
int64_t buffer_last_height{0};
/** Hash of last header in m_redownloaded_headers (initialized to
* m_chain_start). We have to cache it because we don't have hashPrevBlock
* available in a CompressedHeader.
*/
uint256 buffer_last_hash;
/** The hashPrevBlock entry for the first header in m_redownloaded_headers
* We need this to reconstruct the full header when it's time for
* processing.
*/
uint256 buffer_first_prev_hash;
/** The accumulated work on the redownloaded chain. */
arith_uint256 chain_work;
/** Set this to true once we encounter the target blockheader during phase
* 2 (REDOWNLOAD). At this point, we can process and store all remaining
* headers still in m_redownloaded_headers.
*/
bool process_all_remaining_headers{false};
} m_redownload;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My feelings don't matter here :)
If this untangles the two scenarios and I don't have to fully understand one when reviewing the other, it's arguably better, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I prefer having the indentation/grouping in the header file, which makes it read as more orderly. But in itself doesn't change behavior. Good to hear that you're not vehemently opposed. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation/grouping in the header file
yes, even different files to avoid the mess we have e.g. in validation.cpp
src/headerssync.cpp
Outdated
ret = ProcessRedownload(received_headers, full_headers_message); | ||
break; | ||
case State::FINAL: | ||
Assume(false); // Should never be called again after entering FINAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we rather avoid this instead? Or at least print something instead of a comment? I hate comments, it's a sign that we've given up :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed assume condition to: Assume(m_state != State::FINAL)
.
Would love proper support for error strings in asserts/assume.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is worse than before, m_state != State::FINAL
is always true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inside case State::FINAL
, m_state != State::FINAL
is always false
, triggering failure. This is a common idiom for documenting what condition has failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but ideally we shouldn't be able to get to an illegal state in the first place - though it's not your code that is at fault here, but maybe we can reduce the invalid possibilities
src/common/system.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
40e9538
This is a huge commit, can you break it up meaningfully?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broke out the RAM-querying to later commit.
I thought about making the cache_bytes
parameter to the HeadersSyncState
-ctor default to std::nullopt
or 0
and then introduce all the setting of that value in subsequent commits, but it felt too contrived, not leaving that commit in an acceptable state. Files with call-sites sending in that value are more than a handful, but they are quite simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss in person why you think that's an important part of this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reached agreement that it would be best for now to share RAM usage with dbcache, and to enable automatic sizing of dbcache/others based upon available RAM, in a separate PR.
I've been sifting through bitcoin#25717 and bitcoin#25720. The behavior we currently have on master is:
- Start a
HeadersSyncState
for a given peer. - If we receive a block inv for a new block, fetch headers and in response trigger an additional
HeadersSyncState
to be created for that other peer. - Add new
HeadersSyncState
s for each new block reported.
There are thoughts about switching to instead add another HeadersSyncState
every 5-10 minutes instead of triggered by a new block being mined & propagated, which would result in more stability. https://github.com/bitcoin/bitcoin/pull/25720/files#r936707629
I think before sharing memory with dbcache, it might be good to make a separate PR that implements the 5-10 minute interval for spawning of a HeadersSyncState
-peer, maybe capping simultaneous instances at 2-3. That way memory usage would be more predictable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
share RAM usage with dbcache
Rather to steal its usage since we shouldn't be using it yet.
I can't follow the rest of the comments yet, can only do a lightweight review now.
That way we can always proceed instead of just exiting the fuzz target, wasting cycles. Also reduces code complexity.
* Makes tests logically separated through extraction into functions. Also enables using function-scope instances of HeadersSyncState instead of reset()-ing unique_ptr. * Also adds missing comment for part 4.
c0c990f
to
f1fdc84
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushing my first impressions while I'm reviewing
src/common/system.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss in person why you think that's an important part of this change
src/headerssync.cpp
Outdated
ret = ProcessRedownload(received_headers, full_headers_message); | ||
break; | ||
case State::FINAL: | ||
Assume(false); // Should never be called again after entering FINAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is worse than before, m_state != State::FINAL
is always true
@@ -272,7 +272,7 @@ class HeadersSyncState { | |||
bool m_process_all_remaining_headers{false}; | |||
|
|||
/** Current state of our headers sync. */ | |||
State m_download_state{State::PRESYNC}; | |||
State m_state{State::PRESYNC}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just the comment, state
is too general, everything is a state
.
Can we find a better name - or if not, can we refactor, not being able to find a proper name is a strong code smell
src/common/system.cpp
Outdated
@@ -110,3 +120,52 @@ int64_t GetStartupTime() | |||
{ | |||
return nStartupTime; | |||
} | |||
|
|||
std::optional<size_t> GetFreeRAM() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd do both of them in a separate PR (base memory defaults on available memory), I don't think you want to spend your time defending this part of the code in this PR, seems like a different concern. You'd have to test this on a gazillion operating system and compilers and versions...
@@ -50,13 +50,11 @@ FUZZ_TARGET(headers_sync_state, .init = initialize_headers_sync_state_fuzz) | |||
{ | |||
SeedRandomStateForTest(SeedRand::ZEROS); | |||
FuzzedDataProvider fuzzed_data_provider(buffer.data(), buffer.size()); | |||
auto mock_time{ConsumeTime(fuzzed_data_provider)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems it was already available at the time of the pr
@@ -183,6 +183,11 @@ class HeadersSyncState { | |||
const unsigned m_commit_offset; | |||
|
|||
private: | |||
ProcessingResult ProcessPresync(const std::vector<CBlockHeader>& |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this could come after the test breakup (or maybe even the same commit), it seems to follow the same logic
f1fdc84
to
67cd87b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor code update. Mostly updating comment threads.
@@ -50,13 +50,11 @@ FUZZ_TARGET(headers_sync_state, .init = initialize_headers_sync_state_fuzz) | |||
{ | |||
SeedRandomStateForTest(SeedRand::ZEROS); | |||
FuzzedDataProvider fuzzed_data_provider(buffer.data(), buffer.size()); | |||
auto mock_time{ConsumeTime(fuzzed_data_provider)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess if I open a PR just for this is that maflcko will say something like "the fuzz engine should figure out the branch quickly, so it's not an issue".
@@ -272,7 +272,7 @@ class HeadersSyncState { | |||
bool m_process_all_remaining_headers{false}; | |||
|
|||
/** Current state of our headers sync. */ | |||
State m_download_state{State::PRESYNC}; | |||
State m_state{State::PRESYNC}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recap: We discussed on call about seeing if it would be achievable to have different state classes, with each one producing the next as an output, instead of having this enum
. I'm still a bit apprehensive about public accessors of HeadersSyncState
:
Lines 123 to 133 in f1fdc84
/** Return the current state of our download */ | |
State GetState() const { return m_state; } | |
/** Return the height reached during the PRESYNC phase */ | |
int64_t GetPresyncHeight() const { return m_presync_height; } | |
/** Return the block timestamp of the last header received during the PRESYNC phase. */ | |
uint32_t GetPresyncTime() const { return m_presync_last_header_received.nTime; } | |
/** Return the amount of work in the chain received during the PRESYNC phase. */ | |
arith_uint256 GetPresyncWork() const { return m_presync_chain_work; } |
Could maybe lean on std::variant
to some extent as it contains an "implicit enum".
Another consideration is that it seems simpler for tests to be able to check current state enum.
src/common/system.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reached agreement that it would be best for now to share RAM usage with dbcache, and to enable automatic sizing of dbcache/others based upon available RAM, in a separate PR.
I've been sifting through bitcoin#25717 and bitcoin#25720. The behavior we currently have on master is:
- Start a
HeadersSyncState
for a given peer. - If we receive a block inv for a new block, fetch headers and in response trigger an additional
HeadersSyncState
to be created for that other peer. - Add new
HeadersSyncState
s for each new block reported.
There are thoughts about switching to instead add another HeadersSyncState
every 5-10 minutes instead of triggered by a new block being mined & propagated, which would result in more stability. https://github.com/bitcoin/bitcoin/pull/25720/files#r936707629
I think before sharing memory with dbcache, it might be good to make a separate PR that implements the 5-10 minute interval for spawning of a HeadersSyncState
-peer, maybe capping simultaneous instances at 2-3. That way memory usage would be more predictable.
src/headerssync.cpp
Outdated
6 * 24 * 365 * 30, | ||
// We try to make sure not to use more than 10% of | ||
// available RAM for the headers cache. | ||
(GetFreeRAM().value_or(0) / 10) / sizeof(decltype(m_headers_cache)::value_type))))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still think storing to disk is a priority after our latest chats?
* Use BOOST_REQUIRE_EQUAL for HeadersSyncState::State - Nicer failure output and prevents continuing the test. * Verify HeadersSyncState::State directly after ProcessNextHeaders(). * CHECK for more, like Locator and result.pow_validated_headers. * Extract and capitalize constants. * Extract genesis block variable.
Grouped fields specific to PRESYNC/REDOWNLOAD into anonymous-type structs. Pre-existing usage such as GetPresyncWork(), GetPresyncTime(), GetPresyncHeight() make this separation logical. * m_download_state -> m_state: See GetState(). We don't have other state-enums in the class, so drop "download_"-qualifier which could mislead one to associate it with the REDOWNLOAD state.
State processing will increase in complexity in upcoming commit.
While we do want to avoid OOM-attacks through [CVE-2024-52916](https://bitcoincore.org/en/2024/07/03/disclose-header-spam/), we can still serve the happy path by caching headers up to a reasonable chain height. The reasonable chain height is calculated by taking the difference between starting block and our current time, and assuming slightly faster than 10 minute block time averages[^1]. Cuts out time/bandwidth re-downloading same headers from same peer - good for both local and remote node, especially over slow/flaky connections. Even more helpful when the local node has flaky connection to all other nodes. The cache is entirely RAM-based since it is ephemeral by nature, taking less than a minute to fill on a really good connection (at contemporary block heights). Also due to the fact that we throw it away upon completion, or if we end up with a peer that has too-low-PoW or feeds us a malicious chain. The headers cache size is set by -headerssynccache=<MiB>. When unset, we default to storing REDOWNLOAD_BUFFER_SIZE number of headers (= +0.68 MiB) (smaller for shorter chains). This makes us able to return a chunk of headers earlier, after having switched into the REDOWNLOAD phase. [^1]: "the Bitcoin target interval time isn’t really 10 minutes, it’s actually 10 minutes and 0.3 seconds. Not that this bug is significant, actually since Bitcoin launched, the average interval has been 9 minutes and 36 seconds, significantly less than 10 minutes. This is because on average since 2009 the hashrate has been increasing." https://blog.bitmex.com/the-timewarp-attack/ (31 Mar 2025)
… in test. Would be good to have TARGET_BLOCKS = REDOWNLOAD_BUFFER_SIZE + 50 or something. So should move REDOWNLOAD_BUFFER_SIZE to header. Need to update references to REDOWNLOAD_BUFFER_SIZE and modifying headerssync.cpp. This should be its own PR.
67cd87b
to
d68a4ed
Compare
Updated PR description with correct CVE and added reference to Optech podcast episode. |
Interesting, this might intersect with some issues I've been seeing with slow connections. Will try to review soon. |
@jonatack thanks for checking this out! Actually been curious whether it would help your setup, would be nice to have that suspicion validated. I'm trying to smuggle more refined versions of the test refactorings in via bitcoin#32579 which is more ready for approach-level review in case that interests you as well. |
Using Clang clang version 20.1.6 (Fedora 20.1.6-9.fc43) and: ```bash export CC=clang export CXX=clang++ cmake -B build -DBUILD_GUI=ON -DSANITIZERS=address cmake --build build export LSAN_OPTIONS="suppressions=/root/bitcoin/test/sanitizer_suppressions/lsan" ctest --test-dir build ``` ```bash Totals: 3 passed, 0 failed, 0 skipped, 0 blacklisted, 1589ms ********* Finished testing of AddressBookTests ********* ================================================================= ==21869==ERROR: LeakSanitizer: detected memory leaks Direct leak of 88 byte(s) in 1 object(s) allocated from: #0 0xaaaab5d5af40 in operator new(unsigned long) (/root/bitcoin/build/bin/test_bitcoin-qt+0x39af40) (BuildId: c0e038f1c507ea6860d1cfd499ac54ad83359872) #1 0xffff8c8f56cc in QLayoutPrivate::createWidgetItem(QLayout const*, QWidget*) (/lib64/libQt6Widgets.so.6+0x1a56cc) (BuildId: 8b7b9e470f4d4cd920282a4f963abb01225814fa) #2 0xffff8c8d2f90 in QBoxLayout::insertWidget(int, QWidget*, int, QFlags<Qt::AlignmentFlag>) (/lib64/libQt6Widgets.so.6+0x182f90) (BuildId: 8b7b9e470f4d4cd920282a4f963abb01225814fa) #3 0xaaaab5fc7188 in SendCoinsDialog::addEntry() /root/bitcoin/build/src/qt/./qt/sendcoinsdialog.cpp:596:18 #4 0xaaaab5fc4eec in SendCoinsDialog::SendCoinsDialog(PlatformStyle const*, QWidget*) /root/bitcoin/build/src/qt/./qt/sendcoinsdialog.cpp:84:5 #5 0xaaaab5da67ac in (anonymous namespace)::MiniGUI::MiniGUI(interfaces::Node&, PlatformStyle const*) /root/bitcoin/build/src/qt/test/./qt/test/wallettests.cpp:235:75 #6 0xaaaab5da2000 in (anonymous namespace)::TestGUI(interfaces::Node&, std::shared_ptr<wallet::CWallet> const&) /root/bitcoin/build/src/qt/test/./qt/test/wallettests.cpp:270:13 #7 0xaaaab5d9ebc8 in (anonymous namespace)::TestGUI(interfaces::Node&) /root/bitcoin/build/src/qt/test/./qt/test/wallettests.cpp:453:5 #8 0xaaaab5d9ebc8 in WalletTests::walletTests() /root/bitcoin/build/src/qt/test/./qt/test/wallettests.cpp:475:5 #9 0xffff8b1c5314 in QMetaMethodInvoker::invokeImpl(QMetaMethod, void*, Qt::ConnectionType, long long, void const* const*, char const* const*, QtPrivate::QMetaTypeInterface const* const*) (/lib64/libQt6Core.so.6+0x195314) (BuildId: eacb2d1228362560e5df1a1ce496c99ad61960e7) #10 0xffff8b1c5dc8 in QMetaMethod::invokeImpl(QMetaMethod, void*, Qt::ConnectionType, long long, void const* const*, char const* const*, QtPrivate::QMetaTypeInterface const* const*) (/lib64/libQt6Core.so.6+0x195dc8) (BuildId: eacb2d1228362560e5df1a1ce496c99ad61960e7) #11 0xffff8cf57c54 (/lib64/libQt6Test.so.6+0x27c54) (BuildId: 96bb1cdeead53af0ced36d7970cf9cd79c4c4ccd) #12 0xffff8cf5fa18 (/lib64/libQt6Test.so.6+0x2fa18) (BuildId: 96bb1cdeead53af0ced36d7970cf9cd79c4c4ccd) #13 0xffff8cf6067c (/lib64/libQt6Test.so.6+0x3067c) (BuildId: 96bb1cdeead53af0ced36d7970cf9cd79c4c4ccd) #14 0xffff8cf610a4 (/lib64/libQt6Test.so.6+0x310a4) (BuildId: 96bb1cdeead53af0ced36d7970cf9cd79c4c4ccd) #15 0xffff8cf61aa4 in QTest::qRun() (/lib64/libQt6Test.so.6+0x31aa4) (BuildId: 96bb1cdeead53af0ced36d7970cf9cd79c4c4ccd) #16 0xffff8cf61eb4 in QTest::qExec(QObject*, int, char**) (/lib64/libQt6Test.so.6+0x31eb4) (BuildId: 96bb1cdeead53af0ced36d7970cf9cd79c4c4ccd) #17 0xaaaab5d7d77c in main /root/bitcoin/build/src/qt/test/./qt/test/test_main.cpp:95:30 #18 0xffff8aad6398 in __libc_start_call_main (/lib64/libc.so.6+0x26398) (BuildId: 627f878dd454ee3cc1dfdbd347bb565f1ffb53e7) #19 0xffff8aad6478 in __libc_start_main@GLIBC_2.17 (/lib64/libc.so.6+0x26478) (BuildId: 627f878dd454ee3cc1dfdbd347bb565f1ffb53e7) #20 0xaaaab5c74cac in _start (/root/bitcoin/build/bin/test_bitcoin-qt+0x2b4cac) (BuildId: c0e038f1c507ea6860d1cfd499ac54ad83359872) ``` This happens when building using depends: ```bash Indirect leak of 24 byte(s) in 1 object(s) allocated from: #0 0xaaaabdbe86f8 in malloc (/root/bitcoin/build/bin/test_bitcoin-qt+0x4386f8) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #1 0xfbff97f8c164 (<unknown module>) #2 0xaaaabf0cfaa4 in QDBusConnectionPrivate::QDBusConnectionPrivate() (/root/bitcoin/build/bin/test_bitcoin-qt+0x191faa4) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #3 0xaaaabf0c9e30 in QDBusConnectionManager::doConnectToStandardBus(QDBusConnection::BusType, QString const&, bool) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1919e30) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #4 0xaaaabf0cb0e4 in QtPrivate::QCallableObject<QDBusConnectionPrivate* (QDBusConnectionManager::*)(QDBusConnection::BusType, QString const&, bool), QtPrivate::List<QDBusConnection::BusType&, QString const&, bool&>, QDBusConnectionPrivate*>::impl(int, QtPrivate::QSlotObjectBase*, QObject*, void**, bool*) (/root/bitcoin/build/bin/test_bitcoin-qt+0x191b0e4) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #5 0xaaaabf5cbaf0 in QObject::event(QEvent*) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1e1baf0) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #6 0xaaaabf5a4ce0 in QCoreApplicationPrivate::notify_helper(QObject*, QEvent*) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1df4ce0) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #7 0xaaaabf5a486c in QCoreApplication::notifyInternal2(QObject*, QEvent*) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1df486c) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #8 0xaaaabf5a575c in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1df575c) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #9 0xaaaabf66b858 in QEventDispatcherUNIX::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1ebb858) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #10 0xaaaabf5a9e3c in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1df9e3c) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #11 0xaaaabf632a44 in QThread::exec() (/root/bitcoin/build/bin/test_bitcoin-qt+0x1e82a44) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #12 0xaaaabf0c9bd0 in QDBusConnectionManager::run() (/root/bitcoin/build/bin/test_bitcoin-qt+0x1919bd0) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #13 0xaaaabf669c30 in QThreadPrivate::start(void*) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1eb9c30) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #14 0xaaaabdbe5f2c in asan_thread_start(void*) asan_interceptors.cpp.o #15 0xffff99538608 in thread_start (/lib64/libc.so.6+0xf8608) (BuildId: 627f878dd454ee3cc1dfdbd347bb565f1ffb53e7) SUMMARY: AddressSanitizer: 3592 byte(s) leaked in 37 allocation(s). ```
5be31b2 lsan: add more Qt suppressions (fanquake) Pull request description: Using Clang clang version 20.1.6 (Fedora 20.1.6-9.fc43) and: ```bash export CC=clang export CXX=clang++ cmake -B build -DBUILD_GUI=ON -DSANITIZERS=address cmake --build build export LSAN_OPTIONS="suppressions=/root/bitcoin/test/sanitizer_suppressions/lsan" ctest --test-dir build ``` ```bash Totals: 3 passed, 0 failed, 0 skipped, 0 blacklisted, 1589ms ********* Finished testing of AddressBookTests ********* ================================================================= ==21869==ERROR: LeakSanitizer: detected memory leaks Direct leak of 88 byte(s) in 1 object(s) allocated from: #0 0xaaaab5d5af40 in operator new(unsigned long) (/root/bitcoin/build/bin/test_bitcoin-qt+0x39af40) (BuildId: c0e038f1c507ea6860d1cfd499ac54ad83359872) #1 0xffff8c8f56cc in QLayoutPrivate::createWidgetItem(QLayout const*, QWidget*) (/lib64/libQt6Widgets.so.6+0x1a56cc) (BuildId: 8b7b9e470f4d4cd920282a4f963abb01225814fa) #2 0xffff8c8d2f90 in QBoxLayout::insertWidget(int, QWidget*, int, QFlags<Qt::AlignmentFlag>) (/lib64/libQt6Widgets.so.6+0x182f90) (BuildId: 8b7b9e470f4d4cd920282a4f963abb01225814fa) #3 0xaaaab5fc7188 in SendCoinsDialog::addEntry() /root/bitcoin/build/src/qt/./qt/sendcoinsdialog.cpp:596:18 #4 0xaaaab5fc4eec in SendCoinsDialog::SendCoinsDialog(PlatformStyle const*, QWidget*) /root/bitcoin/build/src/qt/./qt/sendcoinsdialog.cpp:84:5 #5 0xaaaab5da67ac in (anonymous namespace)::MiniGUI::MiniGUI(interfaces::Node&, PlatformStyle const*) /root/bitcoin/build/src/qt/test/./qt/test/wallettests.cpp:235:75 #6 0xaaaab5da2000 in (anonymous namespace)::TestGUI(interfaces::Node&, std::shared_ptr<wallet::CWallet> const&) /root/bitcoin/build/src/qt/test/./qt/test/wallettests.cpp:270:13 #7 0xaaaab5d9ebc8 in (anonymous namespace)::TestGUI(interfaces::Node&) /root/bitcoin/build/src/qt/test/./qt/test/wallettests.cpp:453:5 #8 0xaaaab5d9ebc8 in WalletTests::walletTests() /root/bitcoin/build/src/qt/test/./qt/test/wallettests.cpp:475:5 #9 0xffff8b1c5314 in QMetaMethodInvoker::invokeImpl(QMetaMethod, void*, Qt::ConnectionType, long long, void const* const*, char const* const*, QtPrivate::QMetaTypeInterface const* const*) (/lib64/libQt6Core.so.6+0x195314) (BuildId: eacb2d1228362560e5df1a1ce496c99ad61960e7) #10 0xffff8b1c5dc8 in QMetaMethod::invokeImpl(QMetaMethod, void*, Qt::ConnectionType, long long, void const* const*, char const* const*, QtPrivate::QMetaTypeInterface const* const*) (/lib64/libQt6Core.so.6+0x195dc8) (BuildId: eacb2d1228362560e5df1a1ce496c99ad61960e7) #11 0xffff8cf57c54 (/lib64/libQt6Test.so.6+0x27c54) (BuildId: 96bb1cdeead53af0ced36d7970cf9cd79c4c4ccd) #12 0xffff8cf5fa18 (/lib64/libQt6Test.so.6+0x2fa18) (BuildId: 96bb1cdeead53af0ced36d7970cf9cd79c4c4ccd) #13 0xffff8cf6067c (/lib64/libQt6Test.so.6+0x3067c) (BuildId: 96bb1cdeead53af0ced36d7970cf9cd79c4c4ccd) #14 0xffff8cf610a4 (/lib64/libQt6Test.so.6+0x310a4) (BuildId: 96bb1cdeead53af0ced36d7970cf9cd79c4c4ccd) #15 0xffff8cf61aa4 in QTest::qRun() (/lib64/libQt6Test.so.6+0x31aa4) (BuildId: 96bb1cdeead53af0ced36d7970cf9cd79c4c4ccd) #16 0xffff8cf61eb4 in QTest::qExec(QObject*, int, char**) (/lib64/libQt6Test.so.6+0x31eb4) (BuildId: 96bb1cdeead53af0ced36d7970cf9cd79c4c4ccd) #17 0xaaaab5d7d77c in main /root/bitcoin/build/src/qt/test/./qt/test/test_main.cpp:95:30 #18 0xffff8aad6398 in __libc_start_call_main (/lib64/libc.so.6+0x26398) (BuildId: 627f878dd454ee3cc1dfdbd347bb565f1ffb53e7) #19 0xffff8aad6478 in __libc_start_main@GLIBC_2.17 (/lib64/libc.so.6+0x26478) (BuildId: 627f878dd454ee3cc1dfdbd347bb565f1ffb53e7) #20 0xaaaab5c74cac in _start (/root/bitcoin/build/bin/test_bitcoin-qt+0x2b4cac) (BuildId: c0e038f1c507ea6860d1cfd499ac54ad83359872) ``` This happens when building using depends: ```bash Indirect leak of 24 byte(s) in 1 object(s) allocated from: #0 0xaaaabdbe86f8 in malloc (/root/bitcoin/build/bin/test_bitcoin-qt+0x4386f8) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #1 0xfbff97f8c164 (<unknown module>) #2 0xaaaabf0cfaa4 in QDBusConnectionPrivate::QDBusConnectionPrivate() (/root/bitcoin/build/bin/test_bitcoin-qt+0x191faa4) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #3 0xaaaabf0c9e30 in QDBusConnectionManager::doConnectToStandardBus(QDBusConnection::BusType, QString const&, bool) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1919e30) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #4 0xaaaabf0cb0e4 in QtPrivate::QCallableObject<QDBusConnectionPrivate* (QDBusConnectionManager::*)(QDBusConnection::BusType, QString const&, bool), QtPrivate::List<QDBusConnection::BusType&, QString const&, bool&>, QDBusConnectionPrivate*>::impl(int, QtPrivate::QSlotObjectBase*, QObject*, void**, bool*) (/root/bitcoin/build/bin/test_bitcoin-qt+0x191b0e4) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #5 0xaaaabf5cbaf0 in QObject::event(QEvent*) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1e1baf0) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #6 0xaaaabf5a4ce0 in QCoreApplicationPrivate::notify_helper(QObject*, QEvent*) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1df4ce0) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #7 0xaaaabf5a486c in QCoreApplication::notifyInternal2(QObject*, QEvent*) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1df486c) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #8 0xaaaabf5a575c in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1df575c) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #9 0xaaaabf66b858 in QEventDispatcherUNIX::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1ebb858) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #10 0xaaaabf5a9e3c in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1df9e3c) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #11 0xaaaabf632a44 in QThread::exec() (/root/bitcoin/build/bin/test_bitcoin-qt+0x1e82a44) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #12 0xaaaabf0c9bd0 in QDBusConnectionManager::run() (/root/bitcoin/build/bin/test_bitcoin-qt+0x1919bd0) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #13 0xaaaabf669c30 in QThreadPrivate::start(void*) (/root/bitcoin/build/bin/test_bitcoin-qt+0x1eb9c30) (BuildId: dd54811dc11325890f7bac3e3a49d38f5a7ffef5) #14 0xaaaabdbe5f2c in asan_thread_start(void*) asan_interceptors.cpp.o #15 0xffff99538608 in thread_start (/lib64/libc.so.6+0xf8608) (BuildId: 627f878dd454ee3cc1dfdbd347bb565f1ffb53e7) SUMMARY: AddressSanitizer: 3592 byte(s) leaked in 37 allocation(s). ``` ACKs for top commit: maflcko: lgtm ACK 5be31b2 Tree-SHA512: 0c33661c7ec83ea9b874c1ee4ee2de513131690287363e216a88560dfb31a59ef563a50af756c86a991583aa64a600a74e20fd5d6a104cf4c0a27532de8d2211
…xec in RunCommandJSON" faa1c3e Revert "Merge bitcoin#32343: common: Close non-std fds before exec in RunCommandJSON" (MarcoFalke) Pull request description: After a fork() in a multithreaded program, the child can safely call only async-signal-safe functions (see [signal-safety(7)](https://www.man7.org/linux/man-pages/man7/signal-safety.7.html)) until such time as it calls execv. The standard library (`std` namespace) is not async-signal-safe. Also, `throw`, isn't. There was an alternative implementation using `readdir` (bitcoin#32529), but that isn't async-signal-safe either, and that implementation was still using `throw`. So temporarily revert this feature. A follow-up in the future can add it back, using only async-signal-safe functions, or by using a different approach. Fixes bitcoin#32524 Fixes bitcoin#33015 Fixes bitcoin#32855 For reference, a failure can manifest in the GCC debug mode: * While `fork`ing, a debug mode mutex is held (by any other thread). * The `fork`ed child tries to use the stdard libary before `execv` and deadlocks. This may look like the following: ``` (gdb) thread apply all bt Thread 1 (Thread 0xf58f4b40 (LWP 774911) "b-httpworker.2"): #0 0xf7f4f589 in __kernel_vsyscall () #1 0xf79e467e in ?? () from /lib32/libc.so.6 #2 0xf79eb582 in pthread_mutex_lock () from /lib32/libc.so.6 #3 0xf7d93bf2 in ?? () from /lib32/libstdc++.so.6 #4 0xf7d93f36 in __gnu_debug::_Safe_iterator_base::_M_attach(__gnu_debug::_Safe_sequence_base*, bool) () from /lib32/libstdc++.so.6 #5 0x5668810a in __gnu_debug::_Safe_iterator_base::_Safe_iterator_base (this=0xf58f13ac, __seq=0xf58f13f8, __constant=false) at /bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/debug/safe_base.h:91 #6 0x56ddfb50 in __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<int*, std::__cxx1998::vector<int, std::allocator<int> > >, std::__debug::vector<int, std::allocator<int> >, std::forward_iterator_tag>::_Safe_iterator (this=0xf58f13a8, __i=3, __seq=0xf58f13f8) at /bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/debug/safe_iterator.h:162 #7 0x56ddfacb in __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<int*, std::__cxx1998::vector<int, std::allocator<int> > >, std::__debug::vector<int, std::allocator<int> >, std::bidirectional_iterator_tag>::_Safe_iterator (this=0xf58f13a8, __i=3, __seq=0xf58f13f8) at /bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/debug/safe_iterator.h:539 #8 0x56ddfa5b in __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<int*, std::__cxx1998::vector<int, std::allocator<int> > >, std::__debug::vector<int, std::allocator<int> >, std::random_access_iterator_tag>::_Safe_iterator (this=0xf58f13a8, __i=3, __seq=0xf58f13f8) at /bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/debug/safe_iterator.h:687 #9 0x56ddd3f6 in std::__debug::vector<int, std::allocator<int> >::begin (this=0xf58f13f8) at /bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/debug/vector:300 #10 0x57d83701 in subprocess::detail::Child::execute_child (this=0xf58f156c) at ./util/subprocess.h:1372 #11 0x57d80a7c in subprocess::Popen::execute_process (this=0xf58f1cd8) at ./util/subprocess.h:1231 #12 0x57d6d2b4 in subprocess::Popen::Popen<subprocess::input, subprocess::output, subprocess::error, subprocess::close_fds> (this=0xf58f1cd8, cmd_args="fake.py enumerate", args=..., args=..., args=..., args=...) at ./util/subprocess.h:964 #13 0x57d6b597 in RunCommandParseJSON (str_command="fake.py enumerate", str_std_in="") at ./common/run_command.cpp:27 #14 0x57a90547 in ExternalSigner::Enumerate (command="fake.py", signers=std::__debug::vector of length 0, capacity 0, chain="regtest") at ./external_signer.cpp:28 #15 0x56defdab in enumeratesigners()::$_0::operator()(RPCHelpMan const&, JSONRPCRequest const&) const (this=0xf58f2ba0, self=..., request=...) at ./rpc/external_signer.cpp:51 ... (truncated, only one thread exists) ``` ACKs for top commit: fanquake: ACK faa1c3e darosior: ACK faa1c3e Tree-SHA512: 602da5f2eba08d7fe01ba19baf411e287ae27fe2d4b82f41734e05b7b1d938ce94cc0041e86ba677284fa92838e96ebee687023ff28047e2b036fd9a53567e0a
While we do want to avoid OOM-attacks through CVE-2019-25220, we can still serve the happy path by caching headers up to a reasonable chain height.
Cuts out time/bandwidth re-downloading same headers from same peer - especially good for receiver/sender nodes who are on generally slow/flaky connections. As the chain continues to grow, this becomes more important (even though we scale the timeouts).
Inspired by Optech podcast episode.
Queries the machine for free RAM and takes up to 10% of it for the cache. (Which is released when syncing ends).TODO:
HeadersSyncState
peers more predictable by adding new ones based upon timeouts rather than new block announcements.HeadersSyncState
memory overhead back down again.std::variant
?