-
Notifications
You must be signed in to change notification settings - Fork 37.7k
[WIP] "Lockfree" Checkqueue Implementation #8464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] "Lockfree" Checkqueue Implementation #8464
Conversation
src/checkqueue.h
Outdated
@@ -26,7 +26,7 @@ class CCheckQueueControl; | |||
* the master is done adding work, it temporarily joins the worker pool | |||
* as an N'th worker, until all jobs are done. | |||
*/ | |||
template <typename T> | |||
template <typename T, size_t J size_t W> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add comments explaining what J and W are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're viewing an outdated diff, I just did this for a clean interface change then implementation change set of commits. They are unused here.
src/checkqueue.h
Outdated
#include <sstream> | ||
#include <iostream> | ||
// This should be ignored eventually, but needs testing to ensure this works on more platforms | ||
static_assert(ATOMIC_BOOL_LOCK_FREE, "shared_status not lock free"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These error messages can indicate the failed type (bool/long/llong)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a commit which changes them from static_asserts to warnings. I'm unsure if the warning is properly x-platform; seems to not be a consistent thing. In any case, not being lockfree only effects performance, not correctness.
Pull tester failure seems to be semi-related to #8425. |
@JeremyRubin I don't think so. |
@sipa So when I run tests one by one they seem to not fail; when I run them using rpc_tests.py they do fail. Let me know if you have any ideas why this might be the case. |
@JeremyRubin You can try |
@sipa I'm flattered that you tagged me (always an honor to be tagged by |
Thanks @MarcoFalke. Running with
There's nothing in the tmpdir. Rerunning rawtransactions alone (many times) it didn't fail so I think it's random failure as seen in #8425. |
58d0a45
to
139f8c1
Compare
8a2af6e
to
cfe0e22
Compare
… the minimum of various consensus parameters
…ated and breaks the build
…ich can cause tests to hang otherwise. (Maybe the one time setup is something that should make it into the SetupCCheckQueue code itself)
…to cause segfaults
…ous round This fixes a bug where a deallocated check could be called.
…ement, switching to chrono over sys/time
…uts. Someone with more knowledge here should fix this
…acing it with it's definition
464701e
to
72e4296
Compare
} else if (!check()) { | ||
if (emplacer) { | ||
emplacer(CScriptCheck(*coins, tx, i, flags, cacheStore)); | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to prefer else over continue, but it is understandable is just to avoid the re-indent and minimize disruption.
I would the continue be gone "for free" when/if CheckInputs needs a reindent (btw, unrelated, we've been owing one to Consensus::CheckTxInputs for a while...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it was a little bit crowded there.
Randomly noting this won't interfere with #8498 at all (well, probably in obvious-to-resolve ways). I heard %10 to 20% benchmark improvement...happy to benchmark if someone provides dumbed down instructions or they exist already and I missed them. ping @TheBlueMatt |
@jtimon in an hour or less i should have a refactored version which has benchmarks added as commit one so you can test benching on all versions. |
@laanwj I'd like to get #8632 and some variant of #8633 merged if possible before I PR the new version, otherwise when I push the cleaned up version I won't pass tests due to timeouts. You can see the restructured version here if anyone wants to review right away: master...JeremyRubin:lockfree-checkqueue-restructured |
Some benchmarks from the new version BTW: Before
After:
CCheckQueueSpeed is a trivial fake job (benchmark slightly unfair, as the emplacement vs resize call probably costs the time). New code takes 70% of the time. CCheckQueueSpeedPrevectorJob is a job that contains a prevector that (deterministically randomly) is either direct or indirect. My code takes 50% of the time. edit: just noting, there was a small performance bug (using wrong form of emplace_back) and with that the vector test is at 20% :) |
@@ -19,6 +19,7 @@ static const int64_t MAX_BLOCK_SIGOPS_COST = 80000; | |||
/** Coinbase transaction outputs can only be spent after this number of new blocks (network rule) */ | |||
static const int COINBASE_MATURITY = 100; | |||
|
|||
static const unsigned long long MAX_SCRIPTCHECKS = static_cast<unsigned long long>(MAX_BLOCK_SERIALIZED_SIZE)/41; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is 41 safe? Does this consider that invalid signatures/keys could be smaller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs rebase |
@JeremyRubin What's the status of this? |
@fanquake I think I have a rebased version of this laying around somewhere. Have been doing experimentation to see if there's something simpler (read: more reviewable) that accomplishes much of the same. I'll try to see if there's a reasonable version around. If I recall correctly, most of the benefit of the faster CheckQueue was not observable without the cuckoocache, so I focused on that first (lock free queue doesn't matter much if all jobs lock). |
This commit introduces a new Checkqueue that should have much better performance than the previous one. Lockfree in scare quotes because the whole checkqueue doesn't meet the strict academic definition of lockfree, but it does have lockfree operations.
The PR includes 1 new consensus rule, some refactoring of the checkqueue interface, reimplementation of the checkqueue, and tests, and one optional optimization (could be PR'd separately).
Please see https://gist.github.com/JeremyRubin/6eec0427edec6e68755ae274b27baae5 for more information.
edit: forgot WIP label, needs more testing and review. Opening PR for that purpose.
edit edit: removed WIP label, terminology confusion, it is indeed ready for testing.
Todo