Skip to content

Conversation

JeremyRubin
Copy link
Contributor

@JeremyRubin JeremyRubin commented Aug 5, 2016

This commit introduces a new Checkqueue that should have much better performance than the previous one. Lockfree in scare quotes because the whole checkqueue doesn't meet the strict academic definition of lockfree, but it does have lockfree operations.

The PR includes 1 new consensus rule, some refactoring of the checkqueue interface, reimplementation of the checkqueue, and tests, and one optional optimization (could be PR'd separately).

Please see https://gist.github.com/JeremyRubin/6eec0427edec6e68755ae274b27baae5 for more information.

edit: forgot WIP label, needs more testing and review. Opening PR for that purpose.
edit edit: removed WIP label, terminology confusion, it is indeed ready for testing.

Todo

  • Stronger Testing Suite
  • Figure out better wakeup mechanism
  • Add Benchmarking to the old checkqueue compatible with the new one.
  • cross platform testing (ARM, 32 bit,...)
  • remove static asserts for lock-free ness

@JeremyRubin JeremyRubin changed the title "Lockfree" Checkqueue Implementation [WIP] "Lockfree" Checkqueue Implementation Aug 5, 2016
src/checkqueue.h Outdated
@@ -26,7 +26,7 @@ class CCheckQueueControl;
* the master is done adding work, it temporarily joins the worker pool
* as an N'th worker, until all jobs are done.
*/
template <typename T>
template <typename T, size_t J size_t W>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comments explaining what J and W are?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're viewing an outdated diff, I just did this for a clean interface change then implementation change set of commits. They are unused here.

@JeremyRubin JeremyRubin changed the title [WIP] "Lockfree" Checkqueue Implementation "Lockfree" Checkqueue Implementation Aug 5, 2016
src/checkqueue.h Outdated
#include <sstream>
#include <iostream>
// This should be ignored eventually, but needs testing to ensure this works on more platforms
static_assert(ATOMIC_BOOL_LOCK_FREE, "shared_status not lock free");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These error messages can indicate the failed type (bool/long/llong)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a commit which changes them from static_asserts to warnings. I'm unsure if the warning is properly x-platform; seems to not be a consistent thing. In any case, not being lockfree only effects performance, not correctness.

@JeremyRubin
Copy link
Contributor Author

Pull tester failure seems to be semi-related to #8425.

@sipa
Copy link
Member

sipa commented Aug 7, 2016

@JeremyRubin I don't think so. bitcoind segfaults in Travis with this PR (and all rpc tests fail). In #8425 there is just a failure in one test.

@JeremyRubin
Copy link
Contributor Author

@sipa So when I run tests one by one they seem to not fail; when I run them using rpc_tests.py they do fail. Let me know if you have any ideas why this might be the case.

@maflcko
Copy link
Member

maflcko commented Aug 7, 2016

@JeremyRubin You can try qa/pull-tester/rpc-tests.py -parallel=1 to mimic running them by one by one.

@JeremyRand
Copy link
Contributor

@sipa I'm flattered that you tagged me (always an honor to be tagged by
you or any other big name in Bitcoin dev), but this isn't my pull
request. :)

@JeremyRubin
Copy link
Contributor Author

Thanks @MarcoFalke.

Running with -parallel=1 all tests pass, except for rawtransactions which failed with

rawtransactions.py:
Initializing test directory /tmp/testricvjc8u/29
Stopping nodes
Cleaning up
Tests successful

stderr:
 Error: Unable to start HTTP server. See debug log for details.

Pass: False, Duration: 21 s

There's nothing in the tmpdir.

Rerunning rawtransactions alone (many times) it didn't fail so I think it's random failure as seen in #8425.

@JeremyRubin JeremyRubin closed this Aug 8, 2016
@JeremyRubin JeremyRubin reopened this Aug 8, 2016
@laanwj laanwj changed the title "Lockfree" Checkqueue Implementation [WIP] "Lockfree" Checkqueue Implementation Aug 13, 2016
@sipa sipa mentioned this pull request Aug 16, 2016
@JeremyRubin JeremyRubin force-pushed the lockfree-checkqueue-squashed branch 2 times, most recently from 58d0a45 to 139f8c1 Compare August 18, 2016 22:13
@JeremyRubin JeremyRubin force-pushed the lockfree-checkqueue-squashed branch 2 times, most recently from 8a2af6e to cfe0e22 Compare August 29, 2016 02:17
} else if (!check()) {
if (emplacer) {
emplacer(CScriptCheck(*coins, tx, i, flags, cacheStore));
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to prefer else over continue, but it is understandable is just to avoid the re-indent and minimize disruption.
I would the continue be gone "for free" when/if CheckInputs needs a reindent (btw, unrelated, we've been owing one to Consensus::CheckTxInputs for a while...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it was a little bit crowded there.

@jtimon
Copy link
Contributor

jtimon commented Sep 1, 2016

Randomly noting this won't interfere with #8498 at all (well, probably in obvious-to-resolve ways).
utACK main.o is all I can give for now.

I heard %10 to 20% benchmark improvement...happy to benchmark if someone provides dumbed down instructions or they exist already and I missed them. ping @TheBlueMatt

@JeremyRubin
Copy link
Contributor Author

@jtimon in an hour or less i should have a refactored version which has benchmarks added as commit one so you can test benching on all versions.

@JeremyRubin
Copy link
Contributor Author

@laanwj I'd like to get #8632 and some variant of #8633 merged if possible before I PR the new version, otherwise when I push the cleaned up version I won't pass tests due to timeouts.

You can see the restructured version here if anyone wants to review right away: master...JeremyRubin:lockfree-checkqueue-restructured

@JeremyRubin
Copy link
Contributor Author

JeremyRubin commented Sep 2, 2016

Some benchmarks from the new version BTW:

Before

#Benchmark,count,min,max,average
CCheckQueueSpeed,6144,0.000019055791199,0.000413492321968,0.000275931825551
CCheckQueueSpeedPrevectorJob,416,0.001668065786362,0.004345431923866,0.002541266954862

After:

#Benchmark,count,min,max,average
CCheckQueueSpeed,6144,0.000050746137276,0.000491065904498,0.000192628746542
CCheckQueueSpeedPrevectorJob,768,0.000427763909101,0.003052964806557,0.001347809719543

CCheckQueueSpeed is a trivial fake job (benchmark slightly unfair, as the emplacement vs resize call probably costs the time). New code takes 70% of the time.

CCheckQueueSpeedPrevectorJob is a job that contains a prevector that (deterministically randomly) is either direct or indirect. My code takes 50% of the time.

edit: just noting, there was a small performance bug (using wrong form of emplace_back) and with that the vector test is at 20% :)

@@ -19,6 +19,7 @@ static const int64_t MAX_BLOCK_SIGOPS_COST = 80000;
/** Coinbase transaction outputs can only be spent after this number of new blocks (network rule) */
static const int COINBASE_MATURITY = 100;

static const unsigned long long MAX_SCRIPTCHECKS = static_cast<unsigned long long>(MAX_BLOCK_SERIALIZED_SIZE)/41;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is 41 safe? Does this consider that invalid signatures/keys could be smaller?

Copy link
Member

@sipa sipa Sep 10, 2016 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtimon
Copy link
Contributor

jtimon commented Oct 20, 2016

Needs rebase

@fanquake
Copy link
Member

@JeremyRubin What's the status of this?

@JeremyRubin
Copy link
Contributor Author

@fanquake I think I have a rebased version of this laying around somewhere. Have been doing experimentation to see if there's something simpler (read: more reviewable) that accomplishes much of the same. I'll try to see if there's a reasonable version around. If I recall correctly, most of the benefit of the faster CheckQueue was not observable without the cuckoocache, so I focused on that first (lock free queue doesn't matter much if all jobs lock).

@JeremyRubin JeremyRubin closed this Mar 7, 2017
@JeremyRubin JeremyRubin mentioned this pull request Mar 7, 2017
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants