-
Notifications
You must be signed in to change notification settings - Fork 37.7k
rpc/rest/zmq: reduce LOCK(cs_main) scope: ~6 times as many requests per second #26308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ad30672
to
77de778
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 77de778
Master results
hey -n 1000 -c 16 "http://127.0.0.1:8332/rest/block/0000000000000000000592a974b1b9f087cb77628bb4a097d5c2c11b3476a58e.bin"
Summary:
Total: 16.1675 secs
Slowest: 0.5310 secs
Fastest: 0.0372 secs
Average: 0.2569 secs
Requests/sec: 61.3578
Total data: 1285175680 bytes
Size/request: 1295540 bytes
Response time histogram:
0.037 [1] |
0.087 [6] |
0.136 [6] |
0.185 [7] |
0.235 [168] |■■■■■■■■■■
0.284 [685] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.333 [92] |■■■■■
0.383 [3] |
0.432 [10] |■
0.482 [13] |■
0.531 [1] |
Latency distribution:
10% in 0.2290 secs
25% in 0.2390 secs
50% in 0.2520 secs
75% in 0.2671 secs
90% in 0.2880 secs
95% in 0.3055 secs
99% in 0.4610 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0000 secs, 0.0372 secs, 0.5310 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs
req write: 0.0000 secs, 0.0000 secs, 0.0003 secs
resp wait: 0.2172 secs, 0.0197 secs, 0.4897 secs
resp read: 0.0396 secs, 0.0005 secs, 0.0429 secs
Status code distribution:
[200] 992 responses
PR results
hey -n 1000 -c 16 "http://127.0.0.1:8332/rest/block/0000000000000000000592a974b1b9f087cb77628bb4a097d5c2c11b3476a58e.bin"
Summary:
Total: 5.5089 secs
Slowest: 0.3319 secs
Fastest: 0.0221 secs
Average: 0.0856 secs
Requests/sec: 180.0725
Total data: 1285175680 bytes
Size/request: 1295540 bytes
Response time histogram:
0.022 [1] |
0.053 [109] |■■■■■■■■■■
0.084 [397] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.115 [420] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.146 [41] |■■■■
0.177 [3] |
0.208 [4] |
0.239 [1] |
0.270 [7] |■
0.301 [3] |
0.332 [6] |■
Latency distribution:
10% in 0.0508 secs
25% in 0.0690 secs
50% in 0.0831 secs
75% in 0.0979 secs
90% in 0.1090 secs
95% in 0.1220 secs
99% in 0.2890 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0000 secs, 0.0221 secs, 0.3319 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs
req write: 0.0000 secs, 0.0000 secs, 0.0004 secs
resp wait: 0.0486 secs, 0.0126 secs, 0.2870 secs
resp read: 0.0369 secs, 0.0004 secs, 0.0731 secs
Status code distribution:
[200] 992 responses
The results are consistent. The speed improvement seems to be seen only with large blocks, which is great for mainnet. I didn't notice any gains on testnet since blocks are almost empty.
After reading this comment #17161 (comment) it appears it is unsafe to call I proposed a fix in #26316. That or another fix should be merged first before this would be safe for use with pruning I believe. |
0f882ec
to
bfa9663
Compare
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
bfa9663
to
d7f358b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK d7f358b, and I think I disagree with (or don't understand) the comment that this change should require #26316. In all of the cases here, the ReadBlock or UndoRead return value is checked so if reading data fails on windows, the RPC and rest requests should still do the right thing.
Or is the concern not that this change would cause RPC and rest calls to do the wrong thing, but that this change would cause unlink calls to fail on windows and result in a wasted disk space? If that is the bug, and I think a more minimal and elegant fix might just be to make the pruning code resilient to files temporarily being unlinkable because they are in-use, and to just retry and delete them later on the next round of pruning.
@ryanofsky yes, thank you for the suggestion. I've implemented that in #26533. I think that's a better/less invasive solution than what I have proposed in #26316. |
d7f358b
to
2cd94da
Compare
Rebased |
Happy to re-ACK for the nit review ACK 2cd94da 🔗 Show signatureSignature:
|
2cd94da
to
d7f61e7
Compare
@MarcoFalke addressed the nit. |
re-ACK d7f61e7 💫 Show signatureSignature:
|
Concept ACK. |
} | ||
|
||
if (!ReadBlockFromDisk(block, pblockindex, Params().GetConsensus())) { | ||
// Block not found on disk. This could be because we have the block | ||
// header in our index but not yet have the block or did not accept the | ||
// block. | ||
// block. Or if the block was pruned right after we released the lock above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a test for such a case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it it possible to write a test for this deletion race, unless you want to modify the source code as well?
It might be easier to write a functional test for lack of deletion (on Windows) in #26533 by holding a open file handle to the block file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I could see a test for this case would be to have a long sleep above the if
statement, then call PruneBlockFilesManual
in the test and it should fail here, but that would require modifying the source as pointed out by MarcoFalke. Is there another way to introduce an arbitrary delay or block the thread in tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK d7f61e7, I have reviewed the code and it looks OK. Did not make benchmarking though.
const bool is_not_pruned{WITH_LOCK(::cs_main, return !blockman.IsBlockPruned(blockindex))}; | ||
const bool have_undo{is_not_pruned && UndoReadFromDisk(blockUndo, blockindex)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Less negations:
const bool is_not_pruned{WITH_LOCK(::cs_main, return !blockman.IsBlockPruned(blockindex))}; | |
const bool have_undo{is_not_pruned && UndoReadFromDisk(blockUndo, blockindex)}; | |
const bool is_pruned{WITH_LOCK(::cs_main, return blockman.IsBlockPruned(blockindex))}; | |
const bool have_undo{!is_pruned && UndoReadFromDisk(blockUndo, blockindex)}; |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this less negations? You're swapping a negation on one line, for a negation on a different line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this less negations?
line 184: is_not_pruned
line 184: !blockman.IsBlockPruned(blockindex)
line 185: is_not_pruned
total = 3 cases
--
in suggestion
line 185: !is_pruned
total =1 case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't consider variable names describing a state "negations", and I'm not sure why there is a need to avoid them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can also keep everything in one line as-is, and reduce the diff to a one-char diff.
const bool have_undo{WITH_LOCK(::cs_main, return !blockman.IsBlockPruned(blockindex)) && UndoReadFromDisk(blockUndo, blockindex)};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't consider variable names describing a state "negations", and I'm not sure why there is a need to avoid them.
nm, I agree with the current code.
Summary: > After commit ccd8ef6 (D2979) it is no longer required to hold cs_main when calling ReadBlockFromDisk. > > The first commit moves ReadBlockFromDisk outside the lock scope in rest_block, where we can see a huge performance improvement when fetching blocks with multiple threads. This is a partial backport of [[bitcoin/bitcoin#26308 | core#26308]] bitcoin/bitcoin@c75e3d2 Test Plan: With Clang and DEBUG: `ninja all check-all` Reviewers: #bitcoin_abc, sdulfari Reviewed By: #bitcoin_abc, sdulfari Subscribers: sdulfari Differential Revision: https://reviews.bitcoinabc.org/D13052
Summary: > The second commit is from a suggestion on the original PR by jonatack to remove the unnecessary LOCK(cs_main) in the zmq notifier's NotifyBlock. This is a partial backport of [[bitcoin/bitcoin#26308 | core#26308]] bitcoin/bitcoin@7d253c9 Depends on D13052 Test Plan: With Clang and DEBUG: `ninja all check-all` Reviewers: #bitcoin_abc, sdulfari Reviewed By: #bitcoin_abc, sdulfari Differential Revision: https://reviews.bitcoinabc.org/D13053
Summary: This is a partial backport of [[bitcoin/bitcoin#26308 | core#26308]] bitcoin/bitcoin@f00808e Depends on D13053 Test Plan: With Clang and DEBUG: `ninja all check-all` Reviewers: #bitcoin_abc, sdulfari Reviewed By: #bitcoin_abc, sdulfari Differential Revision: https://reviews.bitcoinabc.org/D13054
Summary: This is a partial backport of [[bitcoin/bitcoin#26308 | core#26308]] bitcoin/bitcoin@efd82ae Depends on D13054 Test Plan: With Clang and DEBUG: `ninja all check-all` Reviewers: #bitcoin_abc, sdulfari Reviewed By: #bitcoin_abc, sdulfari Differential Revision: https://reviews.bitcoinabc.org/D13055
Summary: This is a partial backport of [[bitcoin/bitcoin#26308 | core#26308]] bitcoin/bitcoin@4d92b5a Depends on D13055 Test Plan: With Clang and DEBUG: `ninja all check-all` Reviewers: #bitcoin_abc, sdulfari Reviewed By: #bitcoin_abc, sdulfari Differential Revision: https://reviews.bitcoinabc.org/D13056
Summary: This concludes backport of [[bitcoin/bitcoin#26308 | core#26308]] bitcoin/bitcoin@d7f61e7 Depends on D13055 Test Plan: With Clang and DEBUG: `ninja all check-all` Reviewers: #bitcoin_abc, sdulfari Reviewed By: #bitcoin_abc, sdulfari Differential Revision: https://reviews.bitcoinabc.org/D13057
Picking up from #21006.
After commit ccd8ef6 it is no longer required to hold
cs_main
when callingReadBlockFromDisk
. This can be verified inmaster
at https://github.com/bitcoin/bitcoin/blob/master/src/node/blockstorage.cpp#L755. Same can be seen forUndoReadFromDisk
https://github.com/bitcoin/bitcoin/blob/master/src/node/blockstorage.cpp#L485.The first commit moves
ReadBlockFromDisk
outside the lock scope inrest_block
, where we can see a huge performance improvement when fetching blocks with multiple threads.My test setup, on an Intel i7 with 8 cores (16 threads):
bitcoin.conf
:Time per request (mean)
183 ms on master
30 ms this branch
So this can process 6.1 times as many requests, and saturates all the cores instead of keeping them partly idle waiting in the lock. With 8 threads the mean times were 90 ms on master and 19 ms on this branch, a speedup of 4.7x.
Big thanks to martinus for finding this and the original PR.
The second commit is from a suggestion on the original PR by jonatack to remove the unnecessary
LOCK(cs_main)
in the zmq notifier'sNotifyBlock
.I also found that this approach could be applied to rpcs
getblock
(includingverbosity=3
),getblockstats
, andgettxoutproof
with similar very good results. The above benchmarks steps need to be modified slightly for RPC. Run the following ApacheBench command with different request data in a file nameddata.json
:For
getblock
, use the following indata.json
:master - 184 ms mean request time
branch - 28 ms mean request time
For
getblock
with verbosity level 3, use the following indata.json
:This verbosity level fetches an undo file from disk, so it benefits from this approach as well. However, a lot of time is spent serializing to JSON so the performance gain is not as severe.
master - 818 ms mean request time
branch - 505 ms mean request time
For
getblockstats
, use the following indata.json
:This request used a lock on reading both a block and undo file, so the results are very good.
master - 244 ms mean request time
branch - 28 ms mean request time