Better SigCache Implementation #8895

JeremyRubin · 2016-10-05T21:51:18Z

edited: This PR has been squashed with a few revisions from the initial PR. The unsquashed version is still available here master...JeremyRubin:cuckoocache-pull-request-not-squashed. The PR message has also been updated to reflect the current squashed version more closely.

Summary

This PR replaces the boost::unordered_set in sigcache with CuckooCache::cache.

Benefits

Better memory utilization (no pointers/linked lists/stored hashes)
Lock Free concurrent reads and erases
Lazy Deletion of entries (potentially beneficial in case of re-orgs)
Low cache thrashing -- a standard lookup and erase should only require ~9 cache lines (Hash 0..Hash 7, and erase flag)
Allocation free design
Insert guaranteed to terminate
Generations smartly delete older entries before newer ones

Design

The CuckooCache uses a design similar to a Cuckoo Hash Table. Each key inserted has a hash function 0 through hash function 7 (h0..h7).

Lookup(k) is a matter of examining location h0(k)..h7(k).
Insert(k) first sees if h0(k)..h7(k) is empty, and insets to one if so. If not, insert swaps k with the key at one more than the last hash location (so it goes in a cycle through h0..h7) and attempts to insert the swapped out key at it's other location, repeating (up to a limit) if necessary. At that limit, the last entry is dropped. Insert also checks a heuristic (described below) to age a generation before trying to insert.
Delete(k) first does a Lookup(k), then marks the corresponding bit in an std::atomic<uint8_t> as set as a relaxed atomic operation.

More information about the synchronization specification is in the code documentation.

Generation Heuristic

A bit vector is used to track if the element belongs to the current or prior generation. Once the number of the current generation's non-deleted elements exceeds ~45%, the generation is retired. The previously retired generation is marked deleted. Because the cache deletes lazily, these entries are still present but are more likely to be overwritten than elements in the currently retired generation. Generations are scanned to see if they should be retired using a heuristic of how many inserts would need to occur since the last scan to exceed the generation size.

There is a slight (1-bit per entry) memory overhead to this approach. This does not negatively impact this cache because the trade off is worth it for intelligently deleting older entries. Only one bit is used for simplicity, as we get three represented states (deleted, old, current). Using more generations (lets say, 1 more bit) would allow for more granular freeing of only the oldest quarter and not half per scan. This has advantages, but also an additional memory cost and two generations was sufficient for this use case.

Simulation

These simulation results are slightly outdated, @morcos will chime in below with some updated figures. It should be better/the same as below.
This can overall be seen as a 40% improvement on validation times over master on a 16 core machine simulated over a month. The simulation methodology sends relayed transactions and blocks to the node in the order they were received historically. On 4 and 1 core simulations, the improvement is not significant (but no worse). Our tests didn't really hit hard saturation on the cache, which suggests that the megabytes allocated to the cache could be reduced.

Testing

The PR includes a test suite which checks a few key properties:

Check that values not inserted are never readable
Check that a reasonable hit rate is achieved
Check that erased elements are inserted on reasonably well rather than replacing fresh contents
Test that (3) holds when erases are in parallel.

Future Work

There are future optimizations which may be able to bring some of the advantages seen with 16 cores to 4 and 1 cores, but this PR is focused on a minimal complexity patch with a demonstrable improvement over main as a base for future work in this direction.

There are also a few cleanup related items (such as cache initialization) which are not done in this patch to make it a minimal code change. These can be fixed in a later PR if this is accepted.

Acknowledgements

Thanks to @morcos, @sdaftuar, and @TheBlueMatt for their assistance and review on this patch.

fanquake · 2016-10-09T10:40:35Z

Compiling on OS X, with Xcode 8

  CXX      test/test_test_bitcoin-cuckoocache_tests.o
In file included from test/cuckoocache_tests.cpp:4:
In file included from /usr/local/include/boost/test/unit_test.hpp:18:
In file included from /usr/local/include/boost/test/test_tools.hpp:46:
In file included from /usr/local/include/boost/test/tools/old/impl.hpp:19:
In file included from /usr/local/include/boost/test/unit_test_log.hpp:24:
In file included from /usr/local/include/boost/test/utils/wrap_stringstream.hpp:26:
In file included from /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/sstream:174:
In file included from /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/ostream:138:
In file included from /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/ios:216:
In file included from /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/__locale:18:
In file included from /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/mutex:177:
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/functional:992:39: error: 
      type 'float' cannot be used prior to '::' because it has no members
    : public binary_function<typename _Predicate::first_argument_type,
                                      ^
./cuckoocache.h:206:65: note: in instantiation of template class
      'std::__1::binary_negate<float>' requested here
        depth_limit = std::max((uint8_t)1, static_cast<uint8_t>(std::log...
                                                                ^
In file included from test/cuckoocache_tests.cpp:4:
In file included from /usr/local/include/boost/test/unit_test.hpp:18:
In file included from /usr/local/include/boost/test/test_tools.hpp:46:
In file included from /usr/local/include/boost/test/tools/old/impl.hpp:19:
In file included from /usr/local/include/boost/test/unit_test_log.hpp:24:
In file included from /usr/local/include/boost/test/utils/wrap_stringstream.hpp:26:
In file included from /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/sstream:174:
In file included from /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/ostream:138:
In file included from /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/ios:216:
In file included from /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/__locale:18:
In file included from /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/mutex:177:
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/functional:1002:36: error: 
      type 'float' cannot be used prior to '::' because it has no members
    bool operator()(const typename _Predicate::first_argument_type& __x,
                                   ^
In file included from test/cuckoocache_tests.cpp:5:
./cuckoocache.h:206:70: error: no member named 'log2' in namespace 'std'
        depth_limit = std::max((uint8_t)1, static_cast<uint8_t>(std::log2(static...
                                                                ~~~~~^
3 errors generated.

JeremyRubin · 2016-10-09T13:52:26Z

@fanquake should be fixed, wasn't included for some reason.

theuni · 2016-10-10T17:17:44Z

Concept ACK and code review ACK.

I still need to read up on the security/performance guarantees to understand better, but the implementation looks sane.

TheBlueMatt · 2016-10-16T14:38:01Z

Concept ACK, if nothing else the increase in entries-per-byte should be a big win for memory usage.

JeremyRubin · 2016-10-20T21:56:39Z

I just pushed up 4 commits which further improve the cuckoo cache. These commits add epochs/generations for the cache entries which improves the hit rate. Below is a brief description of the changes, for more detail, see the code & code documentation.

The first 2 commits are changes to the tests. The first commit simplifies what is checked in the existing tests (it was over specific to that Cache's design). The second adds a new test which fails under the existing code because the code does not prioritize newer entries.

The third patch adds generations to cache which work as follows: a bit vector is used to track if the element belongs to the current or prior generation. Once the number of the current generation's non-deleted elements exceeds ~45%, the generation is retired. The previously retired generation is marked deleted. Because the cache deletes lazily, these entries are still present but are more likely to be overwritten than elements in the currently retired generation. Generations are scanned to see if they should be retired using a heuristic of how many inserts would need to occur since the last scan to exceed the generation size.

The fourth commit increases the number of hashes used per element from 2 to 8. The benefit of this change is that it permits using a large generation size (e.g., 45%) compared to 2 hashes (30%). A larger generation size is the "effective size" of the cache, i.e., new entries stay in the current generation for longer.

Performance is markedly better, especially under attack scenarios. @morcos has run several simulations which back up this claim.

There is a slight (1-bit per entry) memory overhead to this approach. This does not negatively impact this cache because the trade off is worth it for intelligently deleting older entries. Only one bit is used for simplicity, as we get three represented states (deleted, old, current). Using more generations (lets say, 1 more bit) would allow for more granular freeing of only the oldest quarter and not half per scan. This has advantages, but also an additional memory cost and two generations was sufficient for this use case.

On a slightly separate note, another side benefit of this cache that I failed to mention when I opened the PR (before these added commits) is that inserts are guaranteed to terminate (in the sigcache currently in master, insert may run forever if GetRand is "unlucky").

JeremyRubin · 2016-10-21T18:56:05Z

I've squashed all the commits (unsquashed still available here if you've already looked at this. master...JeremyRubin:cuckoocache-pull-request-not-squashed). I've also edited the PR message.

The current Travis failed build seems to be related to the ongoing Dyn DDoS attack, as Travis is telling me variants of
The repository at JeremyRubin/bitcoin was not found. and The repository at bitcoin/bitcoin was not found.. When that attack eases, I'll make travis retry.

sipa · 2016-10-22T01:22:50Z

src/cuckoocache.h

+     */
+    inline std::array<uint32_t, 8> compute_hashes(const Element& e) const
+    {
+        return {hash_function.template operator()<0>(e) % size,


Instead of size, store the log2 of the size, and use a bitshift or mask here. Modulus operations are very slow.

Yes, this is a rather slow operation, but I don't quite fully understand your operation... doesn't that only work if you're at power of two size?

sipa · 2016-10-22T17:15:12Z

src/cuckoocache.h

+     */
+    inline std::array<uint32_t, 8> compute_hashes(const Element& e) const
+    {
+        return {hash_function.template operator()<0>(e) % size,


Unfortunately the size currently isn't restricted to be a power of two, and by default is not a power of two. Are you suggesting we limit it to be a power of two?

sipa · 2016-10-22T17:31:08Z

src/cuckoocache.h

+     */
+    inline std::array<uint32_t, 8> compute_hashes(const Element& e) const
+    {
+        return {hash_function.template operator()<0>(e) % size,


Yes :) Sorry if that wasn't obvious. We could benchmark whether it matters compared to other operations of course, but restricting to a power of two means you at most get something that's a factor sqrt(2) off of your desired size, which seems acceptable for a cache.

Apologies, long rambly response below:

Overall, I think it's a reasonable idea to do this. Modulo is slow. It's not the bottleneck, but it's a low cost way to make this code faster.

There are a couple of weird things though. You have currently about 2 bits of bookkeeping overhead per entry (one bit for erasure, one bit for epoch). If you're restricting to a power of two size for the main cache memory, you also may as well do some of the following if the user specifies more space: more generations for finer grained generations; additional fee tracking to preferentially evict low fee items; some kind of mempool pointer/index map to evict signatures on mempool eviction. (preface: I don't like this idea, but for sake of discussion) You could also keep a large and a small cache and look up from both; letting you target size = 2^m + 2^n.

Also, power of two works reasonably well low in the series (1,2,4,8,16,32,64,128) but it seems to be kind of uncomfortably that at say a 1GB cache you have to choose between 1 and 2 GB if you want to increase a little bit. Yes, sqrt, but people have fixed memory sizes so it does kind of matter.

Maybe a better inbetween would be to do as (roughly) follows (should work on GCC, clang, and msvc):

DEFAULT_MAX_SIGCACHE_SIZE = 32; #ifdef __MSC_VER #include <intrin.h> #define __builtin_popcount __popcnt #endif bool fast_mod = __builtin_popcount(size) == 1; compute_hash(E e) { if (fast_mod) // compute hash with masks else // compute hash with mod }

and by default not count the bookkeeping bits in the size computation. That way if you pass in a power of two megabytes as the parameter, you get this magic boost in performance, and if you pass in a non power of two if you get what you asked for. The branch shouldn't hurt because it should be easily predicted.

morcos · 2016-10-28T14:13:59Z

Tested ACK either 3ba2bf4 or 121dfcd

I gave some nits about the comments offline.
I am ambivalent about the change to eliminate modulo.

I have extensively tested the performance of this patch. Lock contention on the sig cache is a serious bottleneck to ConnectBlock performance with 8 or more cores. This patch appears to all but eliminate that contention and leads to a 40% improvement in ConnectBlock time for 16 cores. It also allows for further performance improvements that would not see any benefit until this bottleneck was removed.

I have also tested the hit rate performance and it is a significant improvement over the existing sigcache in the event of a spam attack, for a long running node, or in the event of a reorg.

In addition to reviewing the included tests, I've also run my own tests and done code review to be sure that there are no false positives.

morcos · 2016-11-01T14:40:19Z

ACK 453aef4

fanquake · 2016-11-09T14:55:05Z

Testing this (OSX 10.12, XCode 8.1):
Compiling threw 1 new warning:

In file included from script/sigcache.cpp:14:
./cuckoocache.h:220:17: warning: suggest braces around initialization of subobject [-Wmissing-braces]
        return {hash_function.template operator()<0>(e) & hash_mask,
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./cuckoocache.h:438:40: note: in instantiation of member function 'CuckooCache::cache<uint256, (anonymous
      namespace)::SignatureCacheHasher>::compute_hashes' requested here
        std::array<uint32_t, 8> locs = compute_hashes(e);
                                       ^
script/sigcache.cpp:70:25: note: in instantiation of member function 'CuckooCache::cache<uint256,
      (anonymous namespace)::SignatureCacheHasher>::contains' requested here
        return setValid.contains(entry, erase);

Errors trying to compile the tests:

In file included from test/cuckoocache_tests.cpp:5:
./cuckoocache.h:218:36: error: implicit instantiation of undefined template 'std::__1::array<unsigned int, 8>'
    inline std::array<uint32_t, 8> compute_hashes(const Element& e) const
                                   ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/__tuple:116:65: note: template is declared here
template <class _Tp, size_t _Size> struct _LIBCPP_TYPE_VIS_ONLY array;
                                                                ^
In file included from test/cuckoocache_tests.cpp:5:
./cuckoocache.h:368:33: error: implicit instantiation of undefined template 'std::__1::array<unsigned int, 8>'
        std::array<uint32_t, 8> locs = compute_hashes(e);
                                ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/__tuple:116:65: note: template is declared here
template <class _Tp, size_t _Size> struct _LIBCPP_TYPE_VIS_ONLY array;
                                                                ^
In file included from test/cuckoocache_tests.cpp:5:
./cuckoocache.h:438:33: error: implicit instantiation of undefined template 'std::__1::array<unsigned int, 8>'
        std::array<uint32_t, 8> locs = compute_hashes(e);
                                ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/__tuple:116:65: note: template is declared here
template <class _Tp, size_t _Size> struct _LIBCPP_TYPE_VIS_ONLY array;

  CXX      test/test_test_bitcoin-key_tests.o
In file included from test/cuckoocache_tests.cpp:4:
In file included from /usr/local/include/boost/test/unit_test.hpp:18:
In file included from /usr/local/include/boost/test/test_tools.hpp:46:
In file included from /usr/local/include/boost/test/tools/old/impl.hpp:19:
In file included from /usr/local/include/boost/test/unit_test_log.hpp:18:
In file included from /usr/local/include/boost/test/tree/observer.hpp:17:
In file included from /usr/local/include/boost/test/detail/global_typedef.hpp:15:
In file included from /usr/local/include/boost/test/utils/basic_cstring/basic_cstring.hpp:21:
In file included from /usr/local/include/boost/test/utils/basic_cstring/bcs_char_traits.hpp:25:
In file included from /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/string:439:
In file included from /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/algorithm:628:
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/memory:1740:31: error: no matching constructor
      for initialization of 'block_activity'
            ::new((void*)__p) _Up(_VSTD::forward<_Args>(__args)...);
                              ^   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/memory:1656:18: note: in instantiation of
      function template specialization 'std::__1::allocator<block_activity>::construct<block_activity, const unsigned int &, CuckooCache::cache<uint256,
      cuckoocache_tests::uint256Hasher> &>' requested here
            {__a.construct(__p, _VSTD::forward<_Args>(__args)...);}
                 ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/memory:1502:14: note: in instantiation of
      function template specialization 'std::__1::allocator_traits<std::__1::allocator<block_activity> >::__construct<block_activity, const unsigned int
      &, CuckooCache::cache<uint256, cuckoocache_tests::uint256Hasher> &>' requested here
            {__construct(__has_construct<allocator_type, _Tp*, _Args...>(),
             ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/deque:1801:21: note: in instantiation of
      function template specialization 'std::__1::allocator_traits<std::__1::allocator<block_activity> >::construct<block_activity, const unsigned int &,
      CuckooCache::cache<uint256, cuckoocache_tests::uint256Hasher> &>' requested here
    __alloc_traits::construct(__a, _VSTD::addressof(*__base::end()), _VSTD::forward<_Args>(__args)...);
                    ^
test/cuckoocache_tests.cpp:368:18: note: in instantiation of function template specialization 'std::__1::deque<block_activity,
      std::__1::allocator<block_activity> >::emplace_back<const unsigned int &, CuckooCache::cache<uint256, cuckoocache_tests::uint256Hasher> &>'
      requested here
        last_few.emplace_back(BLOCK_SIZE, set);
                 ^
test/cuckoocache_tests.cpp:391:5: note: in instantiation of function template specialization
      'cuckoocache_tests::test_cache_generations<CuckooCache::cache<uint256, cuckoocache_tests::uint256Hasher> >' requested here
    test_cache_generations<CuckooCache::cache<uint256, uint256Hasher>>();
    ^
test/cuckoocache_tests.cpp:324:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 2 were provided
    struct block_activity {
           ^
test/cuckoocache_tests.cpp:324:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 2 were provided
4 errors generated.

JeremyRubin · 2016-11-09T21:18:19Z

@fanquake I think I can fix the warnings but I'm curious as to how much that's a goal given that that flag should be deprecated & there are lots of build errors. Happy to push a squashme if you think so.

The build errors should be fixable, must be a different include order or something, I need to add a #include<array>.

fanquake · 2016-11-09T23:19:56Z

@JeremyRubin The warning isn't so much of an issue, was just making a note of it.

JeremyRubin · 2016-11-10T02:39:48Z

@fanquake please confirm fix when you have a moment.

fanquake · 2016-11-10T02:48:06Z

@JeremyRubin Everything compiling file now.

Can you suggest tests/benchmarks that reviewers can run to test the performance increase here, or is that going to be hard for anyone without a-lot of cores?

sipa · 2016-11-15T20:49:36Z

src/cuckoocache.h

+ *   - The name "allow_erase" is used because the real discard happens later.
+ *   - The name "please_keep" is used because elements may be erased anyways on insert.
+ *
+ * @tparam Element should be a POD type that is 32-alignable


Where does the requirement for POD come from?

I suppose the way it is currently written (using swap and move) it should be safe to remove both requirements (POD and 32-alignable).

Prior versions of the code I think were unsafe for that use case.

(I can do a squashme with just that unless you have other small feedbacks)

sipa · 2016-11-15T22:24:38Z

src/cuckoocache.h

+ *
+ * User Must Guarantee:
+ *
+ * 1) Write Requires synchronized access (e.g., a lock)


Any reason why these locks aren't integrated into the cache class? It isn't obvious to be how to correctly synchronize the erase method with the rest.

Yes there is a reason.

The locks aren't actually needed at all how the cache is currently used. All that needs to happen is per thread: at the beginning of a block processing, a memory acquire; at the end of block processing a memory release. Additionally, one more lock must be acquired before block processing by the master to ensure that there are no concurrent writers (from addtomempool) and that master must not release until all slaves release as well. This is much better than the repeated acquire/release of locking/unlocking on every single signature operation.

I have written a version of the code which has this semantics, but some feedback from others felt that it was fragile & prone to someone else breaking that behavior down the line (bad for maintainability). In the interest of keeping that optimization available and not introducing a maintenance hazard, I kept the locking where it was in the "layer up".

JeremyRubin · 2016-11-17T20:10:24Z

@fanquake yeah it'll be tough, there's not much speed improvement from losing the erase locks until ~8 cores. What you could test is better memory efficiency by allocating smaller caches (2 4 8 16...) and comparing performance to a node running with that size and the old cache. I'd recommend running one node and peering your test nodes exclusively thought that to ensure nodes are getting the same messages for a given time. But if you're hardware limited that will be hard.

TheBlueMatt

A few minor nits on the code, but I'd be fine with it as-is...more of my comments were about your comments 👍.

TheBlueMatt · 2016-11-07T21:53:25Z

src/cuckoocache.h

+        std::swap(mem, d.mem);
+    }
+
+    /** bit_set sets an entry as discardable. 


Nit: You have some trailing whitespace on 3 lines in this file, including this one.

TheBlueMatt · 2016-11-07T21:55:26Z

src/cuckoocache.h

+ * that is only thread unsafe on calls to setup. This class bit-packs collection
+ * flags for memory efficiency.
+ *
+ * All operations are std::memory_order_relaxed so external mechanisms must


Why not use release/acquire - its identical instructions on x86 and wont require a full flush anywhere else to be correct (indeed, relaxed is maybe better the way its implemented now, but if we want the read/write locks in the sigcache to be more effecient we might not want to have a full flush there).

You're right it's the same on x86, but it isn't for ARM. We want the operation to be relaxed, so release or acquire are over constrained.

If later changes want to change the memory model, they should do so then. It's hard to say if a full flush is better or worse than lots of release/acquires; that would require benchmarking.

atomic instruction mappings: https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

TheBlueMatt · 2016-11-08T20:03:22Z

src/cuckoocache.h

+     * table, the entry attempted to be inserted is evicted.
+     *
+     */
+    inline void insert(Element e)


It seems to be a strange api to require that a copy be made as the parameter in the function, and then rely on std::move to make insertion effecient (which we cant do for uint256, since it doesnt have any non-POD memory). Seems just as good (and much more common) to pass in a const-reference and then just let operator=() handle the copy.

We also do a std::swap on e so that's why we have a mutable copy.

std::swap is just as slow for uint256, though. Since there is no dynamically-allocated memory we cant speed it up, really.

TheBlueMatt · 2016-11-18T07:03:44Z

src/cuckoocache.h

+            // case behavior (no intermittent erases) would exceed epoch size,
+            // with a reasonable minimum scan size.
+            epoch_heuristic_counter = std::max(1u, std::max(epoch_size / 16,
+                        epoch_size - std::min(epoch_size, epoch_unused_count)));


The std::min should be a NOP. we're in an else(epoch_size > epoch_unused_count).

TheBlueMatt · 2016-11-18T07:18:25Z

src/cuckoocache.h

+
+/** cache implements a cache with properties similar to a cuckoo-set
+ *
+ *  The cache is able to hold up to (~(uint32_t) 1) elements.


nit: technically it can hold up to (~(uint32_t) 1) - 1, because invalid() uses ~(uint32_t) 1, though I think you meant to use 0.

I thought there was a reason I used ~(uint32_t) 1 rather than ~(uint32_t) 0, but for the life of me can't recall it. Will change it to be that, but would appreciate you to review that there was no reason.

TheBlueMatt · 2016-11-18T07:20:20Z

src/cuckoocache.h

+ * 1) Write Requires synchronized access (e.g., a lock)
+ * 2) Read Requires no concurrent Write, synchronized with the last insert.
+ * 3) Erase requires no concurrent Write, synchronized with last insert.
+ * 4) An Eraser must release all Erases before allowing a new Writer.


I believe this also refers to an old/different version of this patch.

I think it's correct? I'll try to clarify.

Oh, indeed, I believe I was mistaken.

TheBlueMatt · 2016-11-18T07:21:01Z

src/cuckoocache.h

+ * @tparam Element should be a POD type that is 32-alignable
+ * @tparam Hash should be a function/callable which takes a template parameter
+ * hash_select and an Element and extracts a hash from it. Should return
+ * high-entropy hashes for `Hash h; h<0>(e) and h<1>(e)`.


Also now out-of-date: you're using 8 hashes.

TheBlueMatt · 2016-11-18T07:21:47Z

src/cuckoocache.h

+
+    /** epoch_heuristic_counter is used to determine when a epoch
+     * might be aged & an expensive scan should be done.
+     * epoch_heuristic_counter is incremented on insert and reset to the


s/incremented/decremented/

TheBlueMatt · 2016-11-18T07:25:36Z

src/cuckoocache.h

+     */
+    void epoch_check()
+    {
+        if ((--epoch_heuristic_counter) != 0)


nit: technically this is an off-by-one: because you check this prior to actually doing the insert you'll always do an "expensive" scan twice for the first (and all) epochs. Practically, you might just want to set epoch_heuristic_count to something like epoch_size + 10 by default, though I suppose it doesnt matter all that much.

👍 -- I fixed it, I think, by decrementing inside the if block. (didn't want to do postfix to stop the counter from underflowing)

TheBlueMatt · 2016-11-18T07:26:36Z

src/cuckoocache.h

+
+    /** insert loops at most depth_limit times trying to insert a hash
+     * at various locations in the table via a variant of the Cuckoo Algorithm
+     * with two hash locations.


s/two/eight/

sipa · 2016-11-30T21:06:18Z

@JeremyRubin Care to have a look at @TheBlueMatt's comments above?

JeremyRubin · 2016-11-30T23:41:55Z

Should be all addressed.

TheBlueMatt · 2016-12-01T02:27:44Z

LGTM at 88b58d3

morcos · 2016-12-05T17:09:41Z

reACK 88b58d3

sdaftuar

Code review ACK, will test. Few comment nits/questions below.

Also, it would be nice if we had a unit test for the sigcache itself...

sdaftuar · 2016-12-07T20:08:03Z

src/cuckoocache.h

+            *
+            * 1) On first iter, always false so defaults to locs[0]
+            * 2) Second iter, last_loc == locs[0] so will go to locs[1]
+            *


I think this comment needs to be updated now that we have more than 2 hashes, right? If I understand right, the algorithm now is:

Swap with the element one past the last one looked at. Example: 1) On first iter, always false so defaults to locs[0]. 2) Second iter, last_loc == locs[k] for some k, so will go to locs[k+1] (wrapping back to 0 if necessary).

sdaftuar · 2016-12-07T20:12:49Z

src/cuckoocache.h

+     */
+    inline bool contains(const Element& e, const bool erase) const
+    {
+


nit: random newline

sdaftuar · 2016-12-07T20:16:29Z

src/script/sigcache.cpp

+ *
+ * This may exhibit platform endian dependent behavior but because these are
+ * nonced hashes (random) and this state is only ever used locally it is safe.
+ * All that matter is local consistency.


nit: "matter" -> "matters"

sdaftuar · 2016-12-07T20:39:04Z

src/script/sigcache.cpp

    }
 };

+// Initialized outisde of VerifySignature to avoid atomic operation per call


nit: Can you clarify this comment -- what is the atomic operation you're referring to?

Also, it seems like this comment is here to explain why you moved the code from its old place to the new place (which is helpful for me to understand!), but it's perhaps not very helpful here in its current form to future code readers who haven't seen the previous implementation.

I believe that a function local static initialized uses an atomic (perhaps even a locking!) operation under the hood for the case where concurrent callers enter the function initially (C++11 guarantees this to be safe).

I also believe that this is not the case for a global static (initialized before concurrency is allowed).

I'm not 100% certain the spec forces conformity on this point, but I think it is at least the case in most implementations.

JeremyRubin · 2016-12-12T21:17:24Z

@sdaftuar, d4294ad should fix your documentation concerns.

RE: sigcache unit tests, I could whip something up, but I think it's a little bit hard to test meaningfully (i.e., beyond what other tests using the CachingTransactionSignatureChecker already cover) without opening up the anonymous namespace in script/sigcache.cpp. We can't detect actual cache hits v.s. misses with just the CachingTransactionSignatureChecker::VerifySignature as it just returns a bool. As this would require a bit of restructuring, it's probably easiest review that as a separate PR, if needed.

sdaftuar · 2016-12-13T21:07:29Z

Tested ACK 88b58d3, and the comments in d4294ad look good too, thanks.

Re: sigcache tests, I don't know the best way to address. If I could come up with a test suite that exercised the logic sufficiently to make changes like this safer, then I'd happily suggest that, but it seems like a hard problem. Still, after reviewing I'm comfortable enough with this PR to proceed without additional sigcache tests (and it's great that we have tests for the cuckoocache at least).

sipa · 2016-12-14T06:44:05Z

utACK. Please squash? :)

SQUASHME: Change cuckoocache to only work for powers of two, to avoid mod operator SQUASHME: Update Documentation and simplify logarithm logic SQUASHME: OSX Build Errors SQUASHME: minor Feedback from sipa + bluematt SQUASHME: DOCONLY: Clarify a few comments.

SQUASHME: Update Tests for other SQUASHMEs

JeremyRubin · 2016-12-14T21:07:15Z

@sipa Squashed down to just cache and testing commits.

unsquashed version preserved at https://github.com/JeremyRubin/bitcoin/tree/cuckoocache-pull-request-unsquashed

67dac4e Add unit tests for the CuckooCache (Jeremy Rubin) c9e69fb Add CuckooCache implementation and replace the sigcache map_type with it (Jeremy Rubin)

cuckoocache reference: bitcoin/bitcoin#8895 Signed-off-by: k155 <216k155@luxcore.io>

67dac4e Add unit tests for the CuckooCache (Jeremy Rubin) c9e69fb Add CuckooCache implementation and replace the sigcache map_type with it (Jeremy Rubin)

2749d0a build: Include cuckoocache header in Makefile (furszy) 791a51f Deduplicate SignatureCacheHasher (Jeremy Rubin) 7a5918a Decrease testcase sizes in cuckoocache tests (Jeremy Rubin) 8ef3cc9 Trivial: fix comments referencing AppInit2 (Marko Bencun) f21b122 Ensure `-maxsigcachesize` is in valid range (John Newbery) 464b922 Add unit tests for the CuckooCache (Jeremy Rubin) 77e3b1e Add CuckooCache implementation and replace the sigcache map_type with it (Jeremy Rubin) Pull request description: Built on top of #1668. Part of the back port series to get us closer to upstream's sigcache current state. The main ones are: * bitcoin#8895 * bitcoin#9480 * bitcoin#9393 ACKs for top commit: random-zebra: Really nice backport. Tested ACK 2749d0a Fuzzbawls: utACK 2749d0a Tree-SHA512: 22f760edc928790f29a7bbd6e907a561b7ad266b6cc5e85f207206e0f39ae3e4328ec57c0be772c96036a15559b13f6956bea5762cace10f3a560090aeb6758b

fanquake added the Validation label Oct 6, 2016

maflcko added the Resource usage label Oct 6, 2016

JeremyRubin force-pushed the cuckoocache-pull-request branch 2 times, most recently from 1e86d21 to 1cfb7d1 Compare October 21, 2016 18:07

JeremyRubin force-pushed the cuckoocache-pull-request branch from 1cfb7d1 to 3ba2bf4 Compare October 22, 2016 00:51

sipa reviewed Oct 22, 2016

View reviewed changes

maflcko added this to the 0.14.0 milestone Nov 10, 2016

sipa reviewed Nov 15, 2016

View reviewed changes

TheBlueMatt approved these changes Nov 18, 2016

View reviewed changes

sdaftuar reviewed Dec 7, 2016

View reviewed changes

JeremyRubin added 2 commits December 14, 2016 16:02

Add unit tests for the CuckooCache

67dac4e

SQUASHME: Update Tests for other SQUASHMEs

JeremyRubin force-pushed the cuckoocache-pull-request branch from d4294ad to 67dac4e Compare December 14, 2016 21:03

sipa merged commit 67dac4e into bitcoin:master Dec 15, 2016

sipa added a commit that referenced this pull request Dec 15, 2016

Merge #8895: Better SigCache Implementation

b83264d

67dac4e Add unit tests for the CuckooCache (Jeremy Rubin) c9e69fb Add CuckooCache implementation and replace the sigcache map_type with it (Jeremy Rubin)

216k155 added a commit to LUX-Core/lux that referenced this pull request Aug 25, 2018

Add - Cuckoocache, Massive NULL -> nullptr

29ffaf9

cuckoocache reference: bitcoin/bitcoin#8895 Signed-off-by: k155 <216k155@luxcore.io>

JeremyRubin mentioned this pull request Feb 21, 2019

Switch from LRU cache to CuckooCache mimblewimble/grin#2615

Closed

furszy mentioned this pull request Jun 8, 2020

Better sigcache implementation (CuckooCache) PIVX-Project/PIVX#1669

Merged

str4d mentioned this pull request Apr 20, 2021

Replace SigCache implementation with CuckooCache zcash/zcash#5066

Merged

bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021

Better SigCache Implementation #8895

Better SigCache Implementation #8895

Uh oh!

Conversation

JeremyRubin commented Oct 5, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benefits

Design

Generation Heuristic

Simulation

Testing

Future Work

Acknowledgements

Uh oh!

fanquake commented Oct 9, 2016

Uh oh!

JeremyRubin commented Oct 9, 2016

Uh oh!

theuni commented Oct 10, 2016

Uh oh!

TheBlueMatt commented Oct 16, 2016

Uh oh!

JeremyRubin commented Oct 20, 2016

Uh oh!

JeremyRubin commented Oct 21, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

morcos commented Oct 28, 2016

Uh oh!

morcos commented Nov 1, 2016

Uh oh!

fanquake commented Nov 9, 2016

Uh oh!

JeremyRubin commented Nov 9, 2016

Uh oh!

fanquake commented Nov 9, 2016

Uh oh!

JeremyRubin commented Nov 10, 2016

Uh oh!

fanquake commented Nov 10, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JeremyRubin commented Nov 17, 2016

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JeremyRubin commented Oct 5, 2016 •

edited

Loading