-
Notifications
You must be signed in to change notification settings - Fork 37.7k
kernel: De-globalize validation caches #30141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK. Nice.
79c9c55
to
4a1df97
Compare
Updated 79c9c55 -> 4a1df97 (noGlobalScriptCache_0 -> noGlobalScriptCache_1, compare) |
4a1df97
to
5b1576b
Compare
Updated 4a1df97 -> 5b1576b (noGlobalScriptCache_1 -> noGlobalScriptCache_2, compare) |
5b1576b
to
63923c8
Compare
Rebased 5b1576b -> 63923c8 (noGlobalScriptCache_2 -> noGlobalScriptCache_3, compare)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach ACK 63923c8
63923c8
to
6ad4aa8
Compare
Thank you for the review @stickies-v, 63923c8 -> 6ad4aa8 (noGlobalScriptCache_3 -> noGlobalScriptCache_4, compare)
|
6ad4aa8
to
fa660a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK c1d6e52. Nice that this PR is +202 −261 lines and cleans a number of things up in addition to removing the globals.
I noticed #10754 while reviewing this, which suggests the idea of using a shared cache and could be something to follow up on. It also provides more background information about the caches.
re: #30141 (comment)
This was a regression I introduced after the first round of review
That's interesting, I guess it shows the dangers of reviewing with range-diff, because it's hard to know if a review suggestion was fully implemented or there may be some old code left behind.
I don't think this is related to |
I'll often ack a PR and make a suggestion like you can "replace foo/2 with bar". Then if the PR is updated, I will check the range diff and re-ack with "the only thing that changed since last review is replacing foo/2 with bar" without looking at the complete diff again. This is potentially dangerous because it doesn't the verify suggestion was fully implemented, which depending on the suggestion might introduce bugs. In this case there was a pretty obvious bug that early reviewers who looked at multiple versions of the PR missed, and a later reviewer who was reviewing at the PR for the first time pointed out. So relying too much on range-diff might have caused this, and it's a good reminder (to myself at least) to be careful when using it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK c1d6e52
@@ -2145,10 +2147,10 @@ bool CheckInputScripts(const CTransaction& tx, TxValidationState& state, | |||
// properly commits to the scriptPubKey in the inputs view of that | |||
// transaction). | |||
uint256 hashCacheEntry; | |||
CSHA256 hasher = g_scriptExecutionCacheHasher; | |||
CSHA256 hasher = validation_cache.ScriptExecutionCacheHasher(); | |||
hasher.Write(UCharCast(tx.GetWitnessHash().begin()), 32).Write((unsigned char*)&flags, sizeof(flags)).Finalize(hashCacheEntry.begin()); | |||
AssertLockHeld(cs_main); //TODO: Remove this requirement by making CuckooCache not require external locks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question tangential to this PR: This seems to mean the script execution cache needs something to synchronize access, and it is currently implicitly using cs_main
? I don't see any threads other than message handler that access it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the script execution cache could be wrapped just like the signature cache, which has an internal mutex.
I wonder if there is a DrahtBot parsing bug? It seems to be parsing #30141 (review) as a concept ack instead of a plain ack, and it requested another review from me despite the ack. EDIT: This happened because drahtbot was not parsing the linked comment. I had written another comment with the word A-C-K after my review, and drahtbot seems to looks at the latest comment with the word A-C-K, ignoring earlier comments, as described maflcko/DrahtBot#33 |
DrahtBot is open-source, so pull requests and bug reports are welcome. But I am not sure if the additional code is worth it for the rare case where someone write a comment containing
|
Change LogInstance() function to no longer allocate (and leak) a BCLog::Logger instance. Instead allow kernel applications to initialize their own logging instances that can be returned by LogInstance(). This change is somewhat useful by itself, but more useful in combination with bitcoin#30342. By itself, it gives kernel applications control over when the Logger is created and destroyed and allows new instances to replace old ones. In combination with bitcoin#30342 it allows multiple log instances to exist at the same time, and different output to be sent to different instances. This commit is built on top of bitcoin#30141 since it simplifies the implementation somewhat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK c1d6e52 . Left a few nits but would suggest not to address unless force push is necessary, although of course I'll quickly re-review if you do.
I guess it shows the dangers of reviewing with range-diff
FYI In my case, too, this wasn't a review failure because of range-diff, as I've done multiple full re-reviews that included the affected line. I think I was blind to it because of the similarity to the -maxsigcache
halving that is expected.
Thanks a lot for your keen observation and catching this mistake @ryanofsky.
src/validation.cpp
Outdated
g_scriptExecutionCacheHasher.Write(nonce.begin(), 32); | ||
g_scriptExecutionCacheHasher.Write(nonce.begin(), 32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in 06e87c2: modifying this global inside a constructor seems quite footgunny. The footgun is removed in the next commit 0d41630, but I think the robust thing to do would be to squash the next commit to avoid cherry-pick accidents. I don't practically see this leading to issues, so I'm fine keeping it as is too to minimize the range-diff, so it's probably more of a review note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmh, the footgun here seems less bad than calling InitScriptExecutionCache
twice, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps slightly so, because here at least setup_bytes()
isn't called again.
My main point was that with InitScriptExecutionCache
it's clearly an initialization function that's affecting global state. On the other hand, one shouldn't have to expect that simply constructing an object is going to invalidate all other objects of the same type, that's unintuitive and rather opaque I think.
But we can leave this as is, it's resolved in the next commit after all.
Instead clamp it to uint32::max if it exceeds it. Co-authored-by: Anthony Towns <aj@erisian.com.au>
Move its ownership to the ChainstateManager class. Next to simplifying usage of the kernel library by no longer requiring manual setup of the cache prior to using validation code, it also slims down the amount of memory allocated by BasicTestingSetup.
Move it to the ChainstateManager class.
This is done in preparation for the following commit. Also rename it to SignatureCache.
Move its ownership to the ChainstateManager class. Next to simplifying usage of the kernel library by no longer requiring manual setup of the cache prior to using validation code, it also slims down the amount of memory allocated by BasicTestingSetup. Use this opportunity to make SignatureCache RAII styled Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
c1d6e52
to
606a7ab
Compare
Updated c1d6e52 -> 606a7ab (noGlobalScriptCache_13 -> noGlobalScriptCache_14, compare)
|
@@ -36,7 +39,7 @@ FUZZ_TARGET(script_sigcache, .init = initialize_script_sigcache) | |||
const CAmount amount = ConsumeMoney(fuzzed_data_provider); | |||
const bool store = fuzzed_data_provider.ConsumeBool(); | |||
PrecomputedTransactionData tx_data; | |||
CachingTransactionSignatureChecker caching_transaction_signature_checker{mutable_transaction ? &tx : nullptr, n_in, amount, store, tx_data}; | |||
CachingTransactionSignatureChecker caching_transaction_signature_checker{mutable_transaction ? &tx : nullptr, n_in, amount, store, *g_signature_cache, tx_data}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating the sig cache every iteration (instead of the global) would be better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we ideally check both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean test a global and local cache? We shouldn't fuzz globals (unless we can reset their state).
The reason I commented is that globals make fuzz tests non-deterministic. Let's say the harness is called (in-process) with input A, B and then C. It crashes on input C (which will be the input that the fuzz engine reports to you) but it only crashed because of the data stored in the global cache from input A and B. Giving the harness just input C won't make it crash, because it depends on the state from the previous iterations.
The way fuzz engines gather coverage information is also thrown off by non-determinism like this. They are designed under the assumption that the body of the fuzz harness is a pure function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-ACK 606a7ab
// DoS prevention: limit cache size to 32MiB (over 1000000 entries on 64-bit | ||
// systems). Due to how we count cache size, actual memory usage is slightly | ||
// more (~32.25 MiB) | ||
static constexpr size_t DEFAULT_MAX_SIG_CACHE_BYTES{32 << 20}; | ||
static constexpr size_t DEFAULT_VALIDATION_CACHE_BYTES{32 << 20}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: having DEFAULT_VALIDATION_CACHE_BYTES
and DEFAULT_SCRIPT_EXECUTION_CACHE_BYTES
defined in sigcache.h
is not ideal. One alternative is to rename script/sigcache.h
to script/cache.h
and move ValidationCache
in it, but that touches quite a few lines so I'm not convinced that's the best thing to do for this PR to make progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reACK 606a7ab
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 606a7ab. Just small formatting, include, and static_assert changes since last review.
I think it would be great to follow up on dergoegge's comment about fuzzing
#30141 (comment). It seems like it could make fuzzing output much more useful. I don't think it is critical to do it as part of this PR though, since the fuzz test currently relies on global state and this PR isn't changing that.
Ported to the CMake-based build system in hebasto#264. |
The validation caches are currently setup independently from where the rest of the validation code is initialized. This makes their ownership semantics unclear. There is also no clear enforcement on when and in what order they need to be initialized. The caches are always initialized in the
BasicTestingSetup
although a number of tests don't actually need them.Solve this by moving the caches from global scope into the
ChainstateManager
class. This simplifies the usage of the kernel library by no longer requiring manual setup of the caches prior to using theChainstateManager
. Tests that need to access the caches can instantiate them independently.This pull request is part of the libbitcoinkernel project.