-
Notifications
You must be signed in to change notification settings - Fork 37.7k
indexes: Add compact block filter headers cache #18960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This takes a different approach from the caching in #16442. I prefer it because it isolates all of the caching logic inside the indexer rather than in net processing. I'll put the other approach up as a different branch for comparison. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
I'm almost ready for ACKing this PR, just have some minor suggestion/discussion points:
? Of course, the drawback of this approach is that the condition for checking needs to be removed if the cache is ever used for other filter headers than those from the checkpoints. Also not sure if its really worth it from a performance point of view. But in 999 out of 1000 times the lookup in the map could be skipped.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with previous comments: I prefer this to the previous approach. I think it would be good to have some benchmarks. Otherwise, code looks very good to me.
Concept ACK |
23be13d
to
7ad83ed
Compare
Done
Changed I'm still working on adding a microbench, although it's not clear to me that it'll show any improvement since the disk reads may be cached by the OS for the microbench. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK.
Kudos on the simplicity of this approach!
Using jnewbery's nice node-shell module (a modification of test-shell that allows to connect with the functional test framework to a running full-node) I crafted up a quick and dirty test-script (see branch https://github.com/theStack/bitcoin/tree/pr18960_test) that sends Freshly started bitcoind from master branch (w/o checkpoint headers cache implementation), after mempool imports are done:
Freshly started bitcoind from PR branch (with checkpoint headers cache implementation), after mempool imports are done:
There doesn't seem to be a noticable difference. I don't know if my methodology is flawed though. Maybe all the processing around is taking so much more time that the actual part of fetching the header (from the cache or from disk) is neglectible. If anyone has an idea how to improve the benchmark script, happy to hear suggestions! // EDIT: As jnewbery found out in the course of today's PR review club meeting, there is a good explanation on why the values are always close to 50ms:
|
Approach ACK |
@theStack I believe in between the requests you can drop the cache to simulate eviction (which may occasionally happen during normal operation). See https://unix.stackexchange.com/questions/17936/setting-proc-sys-vm-drop-caches-to-clear-cache |
@narula suggested in today's PR Review Club meeting that it'd be less cognitive overhead to simply take the |
@theStack - thank you for doing some benching of this. Your approach inspired me to do something similar myself, using debug logs to find the exact time to construct a cfcheckpt response. I've added a
And then ran the following node-shell script:
no cache
There does seem to be a very slight performance improvement after the first request. Maybe around 5-10% from the mean, but variance is quite high, so it might be noise. cache (taking locks twice)
The initial request is slower (presumably because the cache needs to be warmed), but subsequent requests are on average >50% faster than with no cache. cache (taking lock once)
The first request is slightly faster than when taking locks twice (although that might be noise). Subsequent requests are the same speed as the 'taking locks twice' branch, which is what we'd expect because locks are only taken once in that branch if the header is found in the cache. @MarcoFalke - I think these full node manual benches are more meaningful than microbenches, so I'm not planning to add any automated benches to |
I'll make this change tomorrow. |
In yesterday's review club meeting on this PR, it was suggested to put LogPrintf() directly into the network processing code parts to measure the execution time of getcfcheckpt requests. (My previous approach of measuring the request response time from the test framework was not precise enough, as the time also included python overhead and the time for the message handler to wake up.) I did some testing with this refined approach and can roughly confirm the results from jnewbery (see commit theStack@48cff1b): without cache:
with cache:
One can see that there's a speedup of at least 2x with the cache. Note that my approach to get these numbers was slightly different, as I directly measured the time in bitcoind via diff --git a/src/net_processing.cpp b/src/net_processing.cpp
index 1df1fab..6e620d9 100644
--- a/src/net_processing.cpp
+++ b/src/net_processing.cpp
@@ -3380,7 +3380,10 @@ bool ProcessMessage(CNode* pfrom, const std::string& msg_type, CDataStream& vRec
}
if (msg_type == NetMsgType::GETCFCHECKPT) {
+ int64_t start_time = GetTimeMicros();
ProcessGetCFCheckPt(pfrom, vRecv, chainparams, connman);
+ int64_t processing_duration = GetTimeMicros() - start_time;
+ LogPrintf("getcfcheckpt processing took %lldus\n", processing_duration);
return true;
} but the idea is the same. Now I'm very curious if the change from @MarcoFalke: Nice idea to drop the cache between the requests. Unfortunately, on my current development box I don't have the permissions (root rights) to do this :/ if anyone having the possibility, the results would be very interesting. |
7ad83ed
to
4018953
Compare
Done. |
src/index/blockfilterindex.cpp
Outdated
@@ -31,6 +31,8 @@ constexpr char DB_FILTER_POS = 'P'; | |||
constexpr unsigned int MAX_FLTR_FILE_SIZE = 0x1000000; // 16 MiB | |||
/** The pre-allocation chunk size for fltr?????.dat files */ | |||
constexpr unsigned int FLTR_FILE_CHUNK_SIZE = 0x100000; // 1 MiB | |||
/** Maximum size of the cfheaders cache */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: give justification in comments for magic numbers so future readers have some idea how this was chosen and if/when it will need to be adjusted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/index/blockfilterindex.cpp
Outdated
// Add to the headers cache | ||
if (m_headers_cache.size() < CF_HEADERS_CACHE_MAX_SZ) { | ||
m_headers_cache.emplace(block_index->GetBlockHash(), entry.header); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll still want to condition this on block_index->nHeight % CFCHECKPT_INTERVAL == 0
, otherwise the most recent 2000 requested headers will be cached rather than the desired ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yikes. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
utACK |
I've force-pushed a branch that changes this to use an unordered_map and only lock once. Comparing the unordered map implementation to map: unordered_mapLog
mapLog
So from this single test, it appears that an unordered_map is indeed faster. That's what we'd expect, since lookup is O(1) and number of lookups required for a single getcfcheckpt request is O(n), so total is O(n). For a map, a single lookup is O(logn), so total is O(nlogn). The original implementation is faster still, since it's storing the checkpts in a single vector, but the aim of this PR is to avoid disk reads for getcfcheckpt messages, which the simple unordered_map implementation does. Further performance improvements could be done as a follow-up if required. |
4018953
to
b6eba9e
Compare
ACK b6eba9e Ran all tests. |
utACK b6eba9e |
src/index/blockfilterindex.cpp
Outdated
{ | ||
LOCK(m_cs_headers_cache); | ||
auto header = m_headers_cache.find(block_index->GetBlockHash()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before the last change, the cache lookup only happened for checkpoint blocks (i.e. block number modulo 1000 is zero). Is this removal of the condition intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Not intentional. I just messed up splitting out the LOCK(m_cs_headers_cache)
. Thanks for catching this.
b6eba9e
to
c63e934
Compare
Cache block filter headers at heights of multiples of 1000 in memory. Block filter headers at height 1000x are checkpointed, and will be the most frequently requested. Cache them in memory to avoid costly disk reads.
c63e934
to
0187d4c
Compare
ACK 0187d4c |
ACK 0187d4c 🎉 By the way, I'm neutral to the question on whether to use Test with
➡️ Average subsequent lookup: 3,346ms Test with
➡️ Average subsequent lookup: 3,078ms |
re-utACK 0187d4c Compared to my last review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review ACK 0187d4c.
I prefer this approach of caching inside index instead of net_processing compare to #16442, it reduces cognitive load. I also assent with ignoring reorg as argued, probability of the event is rare enough and overhead cheap enough to justify having some junk in cache.
@@ -30,6 +38,9 @@ class BlockFilterIndex final : public BaseIndex | |||
bool ReadFilterFromDisk(const FlatFilePos& pos, BlockFilter& filter) const; | |||
size_t WriteFilterToDisk(FlatFilePos& pos, const BlockFilter& filter); | |||
|
|||
Mutex m_cs_headers_cache; | |||
std::unordered_map<uint256, uint256, FilterHeaderHasher> m_headers_cache GUARDED_BY(m_cs_headers_cache); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have to change PR again, you may comment that key value is block hash and mapped value filter-header, type being the same it may be confusing at first look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll add a comment if I need to retouch the branch.
code review ACK 0187d4c |
0187d4c [indexes] Add compact block filter headers cache (John Newbery) Pull request description: Cache block filter headers at heights of multiples of 1000 in memory. Block filter headers at height 1000x are checkpointed, and will be the most frequently requested. Cache them in memory to avoid costly disk reads. ACKs for top commit: jkczyz: ACK 0187d4c theStack: ACK 0187d4c 🎉 fjahr: re-utACK 0187d4c laanwj: code review ACK 0187d4c ariard: Code Review ACK 0187d4c. Tree-SHA512: 2075ae36901ebcdc4a217eae5203ebc8582181a0831fb7a53a119f031c46bca960a610a38a3d0636a9a405f713efcf4200c85f10c8559fd80139036d89473c56
Summary: ``` Cache block filter headers at heights of multiples of 1000 in memory. Block filter headers at height 1000x are checkpointed, and will be the most frequently requested. Cache them in memory to avoid costly disk reads. ``` Backport of core [[bitcoin/bitcoin#18960 | PR18960]]. Depends on D8463. Test Plan: ninja all check-all Reviewers: #bitcoin_abc, deadalnix Reviewed By: #bitcoin_abc, deadalnix Differential Revision: https://reviews.bitcoinabc.org/D8464
Cache block filter headers at heights of multiples of 1000 in memory.
Block filter headers at height 1000x are checkpointed, and will be the most frequently requested. Cache them in memory to avoid costly disk reads.