Skip to content

Conversation

TheCharlatan
Copy link
Contributor

The validation caches are currently setup independently from where the rest of the validation code is initialized. This makes their ownership semantics unclear. There is also no clear enforcement on when and in what order they need to be initialized. The caches are always initialized in the BasicTestingSetup although a number of tests don't actually need them.

Solve this by moving the caches from global scope into the ChainstateManager class. This simplifies the usage of the kernel library by no longer requiring manual setup of the caches prior to using the ChainstateManager. Tests that need to access the caches can instantiate them independently.


This pull request is part of the libbitcoinkernel project.

@DrahtBot
Copy link
Contributor

DrahtBot commented May 19, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK stickies-v, glozow, ryanofsky
Concept ACK ajtowns
Stale ACK maflcko, theuni

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #30157 (Fee Estimation via Fee rate Forecasters by ismaelsadeeq)
  • #29745 (bench: Adds a benchmark for CheckInputScripts by kevkevinpal)
  • #29641 (scripted-diff: Use LogInfo/LogDebug over LogPrintf/LogPrint by maflcko)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Copy link
Contributor

@ajtowns ajtowns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Copy link
Member

@theuni theuni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK. Nice.

@TheCharlatan TheCharlatan force-pushed the noGlobalScriptCache branch from 79c9c55 to 4a1df97 Compare May 21, 2024 08:36
@TheCharlatan
Copy link
Contributor Author

Updated 79c9c55 -> 4a1df97 (noGlobalScriptCache_0 -> noGlobalScriptCache_1, compare)

  • Addressed @theuni's comment, made CSignatureCache more RAII styled.
  • Addressed @ajtowns's comment, applied his suggestion of dropping the error handling if the cache size exceeds the uint32_t maximum value.

@TheCharlatan TheCharlatan force-pushed the noGlobalScriptCache branch from 4a1df97 to 5b1576b Compare May 21, 2024 21:20
@TheCharlatan
Copy link
Contributor Author

Updated 4a1df97 -> 5b1576b (noGlobalScriptCache_1 -> noGlobalScriptCache_2, compare)

@TheCharlatan
Copy link
Contributor Author

Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach ACK 63923c8

@TheCharlatan
Copy link
Contributor Author

Thank you for the review @stickies-v,

63923c8 -> 6ad4aa8 (noGlobalScriptCache_3 -> noGlobalScriptCache_4, compare)

  • Addressed @stickies-v's comment, deleting copy and move constructor of the newly introduced ValidationCache class.
  • Addressed @stickies-v's comment, applying the suggestion of making the pre-initialized hasher private and adding a getter for it.

@TheCharlatan TheCharlatan force-pushed the noGlobalScriptCache branch from 6ad4aa8 to fa660a2 Compare June 19, 2024 15:15
Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK c1d6e52. Nice that this PR is +202 −261 lines and cleans a number of things up in addition to removing the globals.

I noticed #10754 while reviewing this, which suggests the idea of using a shared cache and could be something to follow up on. It also provides more background information about the caches.

re: #30141 (comment)

This was a regression I introduced after the first round of review

That's interesting, I guess it shows the dangers of reviewing with range-diff, because it's hard to know if a review suggestion was fully implemented or there may be some old code left behind.

@maflcko
Copy link
Member

maflcko commented Jul 3, 2024

That's interesting, I guess it shows the dangers of reviewing with range-diff, because it's hard to know if a review suggestion was fully implemented or there may be some old code left behind.

I don't think this is related to range-diff. It is simply a typo, which can be missed by a human reviewer regardless of how the code or diff is displayed. (I didn't use range-diff and missed it, fwiw)

@ryanofsky
Copy link
Contributor

ryanofsky commented Jul 3, 2024

I don't think this is related to range-diff. It is simply a typo, which can be missed by a human reviewer regardless of how the code or diff is displayed. (I didn't use range-diff and missed it, fwiw)

I'll often ack a PR and make a suggestion like you can "replace foo/2 with bar". Then if the PR is updated, I will check the range diff and re-ack with "the only thing that changed since last review is replacing foo/2 with bar" without looking at the complete diff again. This is potentially dangerous because it doesn't the verify suggestion was fully implemented, which depending on the suggestion might introduce bugs. In this case there was a pretty obvious bug that early reviewers who looked at multiple versions of the PR missed, and a later reviewer who was reviewing at the PR for the first time pointed out. So relying too much on range-diff might have caused this, and it's a good reminder (to myself at least) to be careful when using it.

Copy link
Member

@glozow glozow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK c1d6e52

@@ -2145,10 +2147,10 @@ bool CheckInputScripts(const CTransaction& tx, TxValidationState& state,
// properly commits to the scriptPubKey in the inputs view of that
// transaction).
uint256 hashCacheEntry;
CSHA256 hasher = g_scriptExecutionCacheHasher;
CSHA256 hasher = validation_cache.ScriptExecutionCacheHasher();
hasher.Write(UCharCast(tx.GetWitnessHash().begin()), 32).Write((unsigned char*)&flags, sizeof(flags)).Finalize(hashCacheEntry.begin());
AssertLockHeld(cs_main); //TODO: Remove this requirement by making CuckooCache not require external locks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question tangential to this PR: This seems to mean the script execution cache needs something to synchronize access, and it is currently implicitly using cs_main? I don't see any threads other than message handler that access it though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the script execution cache could be wrapped just like the signature cache, which has an internal mutex.

@DrahtBot DrahtBot requested a review from ryanofsky July 3, 2024 11:40
@ryanofsky
Copy link
Contributor

ryanofsky commented Jul 3, 2024

I wonder if there is a DrahtBot parsing bug? It seems to be parsing #30141 (review) as a concept ack instead of a plain ack, and it requested another review from me despite the ack.

EDIT: This happened because drahtbot was not parsing the linked comment. I had written another comment with the word A-C-K after my review, and drahtbot seems to looks at the latest comment with the word A-C-K, ignoring earlier comments, as described maflcko/DrahtBot#33

@maflcko
Copy link
Member

maflcko commented Jul 3, 2024

DrahtBot is open-source, so pull requests and bug reports are welcome. But I am not sure if the additional code is worth it for the rare case where someone write a comment containing A CK after they sent a proper review A CK commit_hash. If you care about the summary comment being correct, you can:

  • Edit your comments after your review to not contain A CK, or
  • Resubmit your review

ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Jul 3, 2024
Change LogInstance() function to no longer allocate (and leak) a BCLog::Logger
instance. Instead allow kernel applications to initialize their own logging
instances that can be returned by LogInstance().

This change is somewhat useful by itself, but more useful in combination with
bitcoin#30342. By itself, it gives kernel applications control over when the Logger is
created and destroyed and allows new instances to replace old ones. In
combination with bitcoin#30342 it allows multiple log instances to exist at the same
time, and different output to be sent to different instances.

This commit is built on top of bitcoin#30141 since it simplifies the implementation
somewhat.
Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK c1d6e52 . Left a few nits but would suggest not to address unless force push is necessary, although of course I'll quickly re-review if you do.

I guess it shows the dangers of reviewing with range-diff

FYI In my case, too, this wasn't a review failure because of range-diff, as I've done multiple full re-reviews that included the affected line. I think I was blind to it because of the similarity to the -maxsigcache halving that is expected.

Thanks a lot for your keen observation and catching this mistake @ryanofsky.

Comment on lines 2106 to 2107
g_scriptExecutionCacheHasher.Write(nonce.begin(), 32);
g_scriptExecutionCacheHasher.Write(nonce.begin(), 32);
Copy link
Contributor

@stickies-v stickies-v Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in 06e87c2: modifying this global inside a constructor seems quite footgunny. The footgun is removed in the next commit 0d41630, but I think the robust thing to do would be to squash the next commit to avoid cherry-pick accidents. I don't practically see this leading to issues, so I'm fine keeping it as is too to minimize the range-diff, so it's probably more of a review note.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmh, the footgun here seems less bad than calling InitScriptExecutionCache twice, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps slightly so, because here at least setup_bytes() isn't called again.

My main point was that with InitScriptExecutionCache it's clearly an initialization function that's affecting global state. On the other hand, one shouldn't have to expect that simply constructing an object is going to invalidate all other objects of the same type, that's unintuitive and rather opaque I think.

But we can leave this as is, it's resolved in the next commit after all.

TheCharlatan and others added 5 commits July 4, 2024 22:35
Instead clamp it to uint32::max if it exceeds it.

Co-authored-by: Anthony Towns <aj@erisian.com.au>
Move its ownership to the ChainstateManager class.

Next to simplifying usage of the kernel library by no longer requiring
manual setup of the cache prior to using validation code, it also slims
down the amount of memory allocated by BasicTestingSetup.
Move it to the ChainstateManager class.
This is done in preparation for the following commit. Also rename it to
SignatureCache.
Move its ownership to the ChainstateManager class.

Next to simplifying usage of the kernel library by no longer requiring
manual setup of the cache prior to using validation code, it also slims
down the amount of memory allocated by BasicTestingSetup.

Use this opportunity to make SignatureCache RAII styled

Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
@TheCharlatan TheCharlatan force-pushed the noGlobalScriptCache branch from c1d6e52 to 606a7ab Compare July 5, 2024 07:10
@TheCharlatan
Copy link
Contributor Author

Updated c1d6e52 -> 606a7ab (noGlobalScriptCache_13 -> noGlobalScriptCache_14, compare)

@@ -36,7 +39,7 @@ FUZZ_TARGET(script_sigcache, .init = initialize_script_sigcache)
const CAmount amount = ConsumeMoney(fuzzed_data_provider);
const bool store = fuzzed_data_provider.ConsumeBool();
PrecomputedTransactionData tx_data;
CachingTransactionSignatureChecker caching_transaction_signature_checker{mutable_transaction ? &tx : nullptr, n_in, amount, store, tx_data};
CachingTransactionSignatureChecker caching_transaction_signature_checker{mutable_transaction ? &tx : nullptr, n_in, amount, store, *g_signature_cache, tx_data};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating the sig cache every iteration (instead of the global) would be better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we ideally check both?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean test a global and local cache? We shouldn't fuzz globals (unless we can reset their state).

The reason I commented is that globals make fuzz tests non-deterministic. Let's say the harness is called (in-process) with input A, B and then C. It crashes on input C (which will be the input that the fuzz engine reports to you) but it only crashed because of the data stored in the global cache from input A and B. Giving the harness just input C won't make it crash, because it depends on the state from the previous iterations.

The way fuzz engines gather coverage information is also thrown off by non-determinism like this. They are designed under the assumption that the body of the fuzz harness is a pure function.

Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK 606a7ab

// DoS prevention: limit cache size to 32MiB (over 1000000 entries on 64-bit
// systems). Due to how we count cache size, actual memory usage is slightly
// more (~32.25 MiB)
static constexpr size_t DEFAULT_MAX_SIG_CACHE_BYTES{32 << 20};
static constexpr size_t DEFAULT_VALIDATION_CACHE_BYTES{32 << 20};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: having DEFAULT_VALIDATION_CACHE_BYTES and DEFAULT_SCRIPT_EXECUTION_CACHE_BYTES defined in sigcache.h is not ideal. One alternative is to rename script/sigcache.h to script/cache.h and move ValidationCache in it, but that touches quite a few lines so I'm not convinced that's the best thing to do for this PR to make progress.

@DrahtBot DrahtBot requested a review from glozow July 5, 2024 17:52
Copy link
Member

@glozow glozow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reACK 606a7ab

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 606a7ab. Just small formatting, include, and static_assert changes since last review.

I think it would be great to follow up on dergoegge's comment about fuzzing
#30141 (comment). It seems like it could make fuzzing output much more useful. I don't think it is critical to do it as part of this PR though, since the fuzz test currently relies on global state and this PR isn't changing that.

@hebasto
Copy link
Member

hebasto commented Jul 14, 2024

Ported to the CMake-based build system in hebasto#264.

@bitcoin bitcoin locked and limited conversation to collaborators Jul 14, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants