-
Notifications
You must be signed in to change notification settings - Fork 37.7k
index: Check all necessary block data is available before starting to sync #29770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/29770. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Concept ACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
@stickies-v Thanks for the feedback, I will leave this unaddressed for now until #29668 has been merged. Then I will get back to it when I take this out of draft mode. |
809fffd
to
34fdefa
Compare
34fdefa
to
245c09b
Compare
src/index/base.h
Outdated
@@ -160,6 +161,9 @@ class BaseIndex : public CValidationInterface | |||
|
|||
/// Get a summary of the index and its state. | |||
IndexSummary GetSummary() const; | |||
|
|||
/// Data needed in blocks in order for the index to be able to sync | |||
virtual uint32_t RequiredBlockStatusMask() const = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated: I wonder if at some point it could make sense to use a named type for the block status mask.
Something like using BlockStatusMask = std::underlying_type_t<BlockStatus>;
245c09b
to
0354df3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to implement 81638f5 differently, without adding the <chain.h>
dependency to all index headers. This dependency provides access to node internal types that indexes running on other processes (in the future) will not know about.
It seems we all implemented the same "index X requires block data" and "index Y requires block undo data" in slightly different ways. I did it in #26966 some time ago, and ryanofsky also did it in #24230.
My preference would be to follow #24230 and use the custom options class. It is more flexible than introducing a new method to override on all/most classes every time a new index customization is added.
Should not be too complicated to get rid of the dependency. I will draft something, maybe I'll just give @maflcko s suggestion a try.
I'm wouldn't call having helper functions returning bools for each case more flexible than what we have here. At least this is what I saw in #24230 and I think you mean:
It may be more readable in the short term but if we build more complicated stuff beyond just checking undo data I think this will be complicated. And we already use the mask everywhere so why not build on this? I like being able to grep for the blockstatus enum values too in order to see where block data is checked in various ways, that would be also lost. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to implement 81638f5 differently, without adding the
<chain.h>
dependency to all index headers. This dependency provides access to node internal types that indexes running on other processes (in the future) will not know about.Should not be too complicated to get rid of the dependency. I will draft something, maybe I'll just give @maflcko s suggestion a try.
maflcko's suggestion wouldn't remove the dependency. Unless you place the new enum somewhere else?. Which I don't think it worth it.
It seems we all implemented the same "index X requires block data" and "index Y requires block undo data" in slightly different ways. I did it in #26966 some time ago, and ryanofsky also did it in #24230.
My preference would be to follow #24230 and use the custom options class. It is more flexible than introducing a new method to override on all/most classes every time a new index customization is added.I'm wouldn't call having helper functions returning bools for each case more flexible than what we have here. At least this is what I saw in #24230 and I think you mean:
virtual bool RequiresBlockUndoData() const { return false; }
It may be more readable in the short term but if we build more complicated stuff beyond just checking undo data I think this will be complicated. And we already use the mask everywhere so why not build on this? I like being able to grep for the blockstatus enum values too in order to see where block data is checked in various ways, that would be also lost.
If we agree that the final goal of the indexes is to run in isolation, in a standalone process, and be the first consumers of the kernel library, then linking them to chain internal fields like the CBlockIndex
status mask isn't the way to go. These objects will live only within the kernel process.
Also, I don't think the grepping argument is valid in most circumstances. It can be used anywhere to break layer distinctions. For example, what if the GUI needs to know if a certain block is available on disk in the future. Would you add the status masks enum dependency to a widget class? Doing so would break the current structure: views -> model/interfaces -> kernel (this is also done this way to allow the GUI to run in a standalone process as well).
The RequiresBlockUndoData()
implementation is from my PR, and I'm not fan of it anymore (I don't dislike it, just prefer a different approach).
I think #24230 approach is better as it introduces a new base class method to override called CustomOptions()
that returns a struct containing index's specific information. I think that building upon this would be cleaner and more flexible, as we would only need to add fields to a struct instead of changing the base class interface with each new option. - would probably change the struct name, which is currently called NotifyOptions
-.
Yes, I was thinking of putting it in a file for types which we have been doing in the code base in several places. And @ryanofsky suggests introducing a
Yes, I hope they can run in isolation in the future. But the index still needs to have an understanding of what block data is and what it needs in order to decide if it can handle what the the kernel gives it. So I don't think it can be avoided that the index has knowledge of the block status so this would need to be shared. That doesn't mean that they can't run isolation, but the kernel needs to share this knowledge with them. We can only avoid this if we forget about the concept suggested here and let the index ask for data from the kernel until it hits something unexpected and fails or we basically reimplement the block status as a list of bools in the index which will be much harder to reason about and maintain. I don't see how an options object prevents that. |
0354df3
to
d855c59
Compare
just rebased |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the index still needs to have an understanding of what block data is and what it needs in order to decide if it can handle what the the kernel gives it. So I don't think it can be avoided that the index has knowledge of the block status so this would need to be shared. That doesn't mean that they can't run isolation, but the kernel needs to share this knowledge with them. We can only avoid this if we forget about the concept suggested here and let the index ask for data from the kernel until it hits something unexpected and fails or we basically reimplement the block status as a list of bools in the index which will be much harder to reason about and maintain. I don't see how an options object prevents that.
We both agree that indexes need to share their sync data requirements in some way. Perhaps we are not understanding each other because your rationale is based on the current sync mechanism, while I am considering a future version that is no longer at the index base class.
An index running in a standalone process would only sync through kernel signals. It will register with the kernel, providing its last known block hash and the data and events it wants to receive and listen to. Then, it will only react to them. It will no longer request data directly from the kernel as it does currently.
The kernel, running in a different process, will emit the signals required to sync the registered index, just as it does for other listeners like the wallet (no difference between them). It will provide the BlockInfo
struct, which may or may not contain specific data, such as block undo data during connection/disconnection and other information if requested during registration.
This is why using an options struct instead of creating a method for each index sync requirement is more flexible to me. The index will provide this struct during registration to the kernel running in a different process and then forget about it. Then, if any event arrives without the requested data, the index will abort its execution.
Moreover, even if you prefer the multi-overridden-methods approach instead of the options struct (which is how I implemented this in #26966), I don't think accessing the uint32_t
block index status bit flags field helps much in terms of reasoning about the code or maintenance. People working on upper layers like the indexes, the wallet, or the GUI should focus on receiving the data they expect and should not have to worry/learn about how the block verification status is mapped in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a quick sanity check, I rebuilt -coinstatsindex
and checked gettxoutsetinfo
still gives me the same stats at block 840,000.
@@ -132,6 +154,29 @@ def run_test(self): | |||
for i, msg in enumerate([filter_msg, stats_msg, filter_msg]): | |||
self.nodes[i].assert_start_raises_init_error(extra_args=self.extra_args[i], expected_msg=msg+end_msg) | |||
|
|||
self.log.info("fetching the missing blocks with getblockfrompeer doesn't work for block filter index and coinstatsindex") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adabfbc: it would be useful to have an example where getblockfrompeer
does work, i.e. an index that doesn't need undo data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be nice but I'm not sure how to do it. The one candidate index we have for that is txindex
but we wholly disable that in the context of pruning because what it is used for is incompatible with pruning.
@fjahr can you rebase this, I'm still interested in this change. |
1584a79
to
8fb2b77
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach ACK
Can you run clang-format-diff
on the hunks where it makes sense? I think requiring the indexes to declare if they will use undo data is sensible, but I'd also not be opposed to just making all of them require undo data.
src/init.cpp
Outdated
if (summary.synced) continue; | ||
|
||
// Get the last common block between the index best block and the active chain | ||
const CBlockIndex* pindex = WITH_LOCK(::cs_main, return chainman.m_blockman.LookupBlockIndex(summary.best_block_hash)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be ok to scope the lock to just the necessary call to LookupBlockIndex
, but I am also having a hard time saying for certain, that there might not be some edge case race condition here with the later call to Contains
and FindFork
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I am keeping the lock for the later calls as well now.
src/init.cpp
Outdated
older_index_name = summary.name; | ||
if (!pindex) break; // Starting from genesis so no need to look for earlier block. | ||
// Verify all blocks needed to sync to current tip are present unless we already checked all of them above. | ||
if (block_start && !(undo_start && undo_start.value() <= block_start.value())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are just comparing pointers here, but actually want to compare their height. This block of code is also kind of hard to test against from a functional test. Maybe we can add a unit test for StartIndexBackgroundSync
that manipulates some of the internal block index state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, good catch on the pointers! Fixed.
I can try to add such a unit test, do you have specific scenarios in mind that you would like to see covered? If we just want to make sure this block is covered I think I could also add a functional test that basically executes the edge case that @furszy described above: #29770 (comment) and I think that should be fairly easy to do.
@@ -9,15 +9,18 @@ | |||
#include <core_io.h> | |||
#include <streams.h> | |||
#include <sync.h> | |||
#include <threadsafety.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: These include fixes seem a bit random. Why not apply all suggested fixes from the CI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just addressing a specific comment from #29668: #29668 (comment) I'm not sure which CI you mean? But anyway I think I don't want to extend the scope by adding more include fixes, I can drop these if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tidy job includes a complete run of include what you use suggestions for all source files.
8fb2b77
to
449687f
Compare
449687f
to
469a7c2
Compare
Rebased and adapted the changes that #32694 introduced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test doesn’t appear to exercise the issue. I cherry-picked 155803b onto master, and it passes without failure. Was expecting some crash (or a not-common failure) due to the lack of undo data and the missing check for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could implement this with a single chain traversal and without too many changes if you make CheckBlockDataAvailability
aware of the lack of undo data in the genesis block. See furszy@927c13d and furszy@be23f17
469a7c2
to
1f1d9f2
Compare
It was a bit hard for me to relate the changes to the changes here but it seems to me that the major reason this is simpler is because there is no real distinction between indexes that need undo data and indexes that don't need it. This is similar to the suggestion by @mzumsande here previously: #29770 (review). I think the handling of it isn't ideal because it's not consistent. Image two indices are present, one (A) is caught up close to the tip and requires undo data, the other (B) was just started and hasn't synced at all but doesn't need undo data. If A is before B in the list of indexes, then the full chain will be checked for undo data because A latched the undo check to true and then B required the full chain to be checked. If B is before A in the list of indexes no blocks are checked for undo data, not even the few needed for A, because the undo check never latches to true. Does that seem accurate to you? I didn't test this, just looked at your code. Please also note that I try to do a single traversal here in the code too, it is just broken into two different steps. To stay with the example above, my code here should check the few needed undo blocks for A in the first traversal, and then the second picks up where undo traversal ended and checks all the rest of the blocks for B without undo data. It's still a single traversal as in no block should be checked twice. EDIT: No, that last part isn't true anymore. I had that implemented but it was making this even more complicated so I simplified it to that we don't check for block data if we have already checked all the necessary for block data + undo data. I can add this back though. It seems fine to me to move the genesis handling from
Good catch! There was a race condition in the test which I fixed in the latest push. With that the test fails consistently on current master for me. |
I think I'd prefer this move too.
I think I agree, but maybe I am still missing something from furszy's patch. In any case the current approach seems fine to me. |
1f1d9f2
to
041214d
Compare
Ok, I have picked the commit from @furszy moving the genesis handling from |
I think I understand the rationale behind your code, but the example wasn’t very helpful. Mainly because it seems easily solvable by simply checking whether undo data is required before skipping the index. Some simple code so we are sync in that regards: start_block = tip;
requires_undo
for index in indexes:
requires_undo |= index.RequiresUndo()
// set lowest starting height.
status_flags = check_undo ? undo_flag | data_flag : data_flag
if (!checkBlockAvailability(tip, start_block, status_flags))
// fail That being said, as far as I understand it now, the rationale behind splitting the undo data check from the block data check is that their existence on disk might not be synchronized in the future (even though that’s not currently the case). |
// Skip checking data availability if we have not synced any blocks yet | ||
if (current_height > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simple way of misusing this would be to invalidate the chain starting from block 1 with the index off.
So when the node starts with the index enabled, it would have an active chain height of 0 and the index at the previous tip; so this code would skip the block availability checks.
I think we should just drop this line and perform the availability checks all time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the whole chain is invalidated there are no blocks to check so I don't see harm in skipping the checks. But since the checks aren't doing much either I don't see much harm in this and can remove it in the next push.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the whole chain is invalidated there are no blocks to check so I don't see harm in skipping the checks.
We would still need to reorg the index if this happens. From the invalidated tip to the new one. And for that we need to disconnect all invalid blocks. So checking the blocks/undo availability seems correct to me.
Thanks! But sorry, I don't see how the pseudo-code fixes it. I stared at it for a while but it seems to have the exact same behavior as the code in furszy@be23f17 afaict. Can you point me to the difference and ideally maybe show the patch to the code in the commit. Doesn't have to compile, just so I see where the difference is in the code because I just don't see it.
Ok, I can add more documentation, you probably mean within Adding a bit more context as I reflected on this: for me, I think, when I what playing around with different approaches in |
Hmm, maybe easier to do it offline, but I could try. I'm not even suggesting implementing my code at this point. I ended up understanding why you split the check. I'm ok with it. But essentially, what my pseudo-code does is extend the current checks to cover undo data if any active index requires it, regardless of its sync height. The idea was to take advantage of the fact that block and undo data are synchronized on disk, and that the bit flag check has no extra cost (checking In other words, going back to your example — you mentioned something like: Tip at 500 And you said the problem arises when indexB appears first in the loop, right? Because no undo data is checked. But that’s not the case if the flag is just updated at the beginning of the loop.
At least this is what I understood from your example.
Yes. Probably a small: "verify block + undo data existence from tip down to the lowest undo-requiring height, and if no index requires undo data or any block-only index goes lower than the undo index, verify block data existence down to the lowest block-only height". |
// Simulate that the first available block is missing undo data and | ||
// detect this by using a status mask. | ||
first_available_block->nStatus &= ~BLOCK_HAVE_UNDO; | ||
BOOST_CHECK(!blockman.CheckBlockDataAvailability(tip, *first_available_block, BlockStatus{BLOCK_HAVE_DATA | BLOCK_HAVE_UNDO})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the change to CheckBlockDataAvailability
has a problem:
If I add a check here:
BOOST_CHECK(!blockman.CheckBlockDataAvailability(tip, *genesis, BlockStatus{BLOCK_HAVE_DATA | BLOCK_HAVE_UNDO}));
it fails because of the genesis special case code. To fix this, we should make sure that first_block
's predecessor is genesis (e.g. first_block->nHeight == 1
in the genesis special case condition), so that CheckBlockDataAvailability
returns false if genesis is the lower block of the range but a different block of the range is missing undo data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed here furszy@14d3db2. @fjahr all yours (can directly exchange the first commit for this one).
Currently, we check that
BLOCK_HAVE_DATA
is available for all blocks an index needs to sync during startup. However, forcoinstatsindex
andblockfilterindex
we also need the undo data for these blocks. If that data is missing in the blocks, we are currently still starting to sync each of these indices and then crash later when we encounter the missing data.This PR adds explicit knowledge of which block data is needed for each index and then checks its availability during startup before initializing the sync process on them.
This also addresses a few open comments from #29668 in the last commit.