Use SipHash for node eviction #8086

sipa · 2016-05-22T09:14:09Z

No description provided.

gmaxwell · 2016-05-22T10:04:18Z

utACK

TheBlueMatt · 2016-05-23T00:25:45Z

src/hash.h

    int count;

 public:
    CSipHasher(uint64_t k0, uint64_t k1);
    CSipHasher& Write(uint64_t data);
+    CSipHasher& Write(const unsigned char* data, size_t size);


Should probably note somewhere that the two Write methods are partially-mutually-exclusive per-object.

TheBlueMatt · 2016-05-23T02:37:19Z

This is one of those things where having a cryptographic hash function probably isnt /critical/, but is really preferable. Is the speed of SHA256 really slow enough to matter here afer you accepted a new TCP connection (and all associated OS overhead of doing so)?

pstratem · 2016-05-23T04:11:26Z

nACK

This really does need to be a cryptographic hash function.

The performance overhead of opening the connection is almost certainly many many times the cost of this comparison.

gmaxwell · 2016-05-23T05:19:31Z

Siphash is a cryptographic function with all the properties we would desire here: https://eprint.iacr.org/2014/722

It is generally suitable any place a cryptographic hash would be used where the small output size wouldn't be an issue. This is a fine example of such a location, similar to hash tables where the table's smallness makes a larger hash output irrelevant, with only a few hundred peers a 64 bit hash will reliably distinguish them.

TheBlueMatt · 2016-05-23T06:13:27Z

In principal, yes, SipHash was designed to be effectively a cryptographic hash with small output size, but I don't want to fall into the trap of calling something a cryptographic hash when it has only had one or two cryptanalyses published. If we care about the performance difference here, I'd say it's fine, but I'm not sure that we do?

On May 22, 2016 10:19:47 PM PDT, Gregory Maxwell notifications@github.com wrote:

Siphash is a cryptographic function with all the properties we would
desire here: https://eprint.iacr.org/2014/722

It is generally suitable any place a cryptographic hash would be used
where the small output size wouldn't be an issue. This is a fine
example of such a location, similar to hash tables where the table's
smallness makes a larger hash output irrelevant, with only a few
hundred peers a 64 bit hash will reliably distinguish them.

You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#8086 (comment)

gmaxwell · 2016-05-23T06:45:17Z

Keep in mind that the input is also very small, and the keyspace is very large. It's likely that this function is nearly a keyed permutation for inputs this small.

As far as performance, with 125 peers and sha256 here it will take about 2,632,000 cycles to run the hashes here. That is about 2ms cpu per inbound connection. It's not negligible.

(Though sipa should perhaps change it to run the hashing in the pre-processing step instead of in the comparison for a 2*log2(n) speedup-- or better, store it in cnode and compute it on connect to make it O(1) in the number of operations per disconnect.)

pstratem · 2016-05-23T07:20:56Z

@gmaxwell I'll do a fix that still uses SHA256

gmaxwell · 2016-05-23T09:18:04Z

Please walk me through the attack you're imagining here. There are 16 bits of network groups for IPv4, 32 bits for IPv6. In both cases many of the possible values aren't routable on the internet.. Each node has it's own 128-bit secret random seed. The node prefers to keep four hosts based on how their netgroup is ordered by a salted hash. The attacker does X and achieves Y. Help me out with the blanks here. :)

pstratem · 2016-05-23T10:05:55Z

@gmaxwell I don't know but with #8088 the performance difference is going to be basically not measurable.

It's now reduced to the difference between a single SHA256 vs SipHash and comparison between 256 bits vs 64 bits.

gmaxwell · 2016-05-24T18:18:28Z

Come one guys, you can't throw up a bunch of stop signs for security then fail to actually walk me through your security analysis. We want security but not security cargo cult.

Beyond performance there is a consistency point here, we should use the same cryptographic hash-function in all the cases where we need some non-normative small hash, unless there is a good reason not to. If there is a reason it's not suitable here, then its likely not suitable in other cases either.

pstratem · 2016-05-24T18:56:50Z

@gmaxwell If this fails it potentially harms the partition resistance of the network.

If the other uses of SipHash fail they at worst are a denial of service attack (which existed before SipHash).

Given the extremely small performance difference compared to the enormous cost of accepting a new connection I just don't see the (admittedly small) risk being justified.

TheBlueMatt · 2016-05-24T19:11:55Z

If there is no performance difference, why change it? Might as well use a stronger hash anywhere we can if its free.

gmaxwell · 2016-05-24T19:40:31Z

Again, please actually specify the attack. I gave you a template, fill it out.

If the other uses of SipHash fail they at worst are a denial of service attack

The usage in addrman is functionally identical to this.

if there is no performance difference

in this PR it's a pretty substantial performance difference (it's about 2ms per connection), though performance can be achieved in other ways that I pointed out.

Might as well use a stronger hash

On what basis do you believe that another construction would be stronger?

As an aside-- right now the behavior of this is kind of busted. It orders peers by their netgroup and protects four, but it makes no effort to make the four be from different netgroups, which means there is an obvious attack strategy of running four hosts per netgroup in a lots of different netgroups. To avoid that, it should (e.g.) sort by netgroup hash, lastblocktime>0, connection uptime and skip over protecting peers that have the same netgroup as peers that were already protected.

It's more than a little disappointing to see furious hand-waving about the hash function when the basic functionality isn't getting anywhere near that amount of attention.

sipa · 2016-05-25T13:46:53Z

I think you're all exaggerating:

@pstratem @TheBlueMatt SipHash is more than sufficient in this case (hell, multiplying the vchGroup (interpreted as a number) by a random odd 64-bit integer likely already results in a sufficiently unpredictable permutation).
@gmaxwell After caching at the CNode level, performance of the hash function is irrelevant. I think SipHash is more appropriate, but SHA256 is certainly not inappropriate.

So let's just combine the approaches.

I've included #8088 in this PR, and made a few more changes (= make the whole eviction logic work using nKeyedNetGroup, rather than only in the comparison). I've also made CNode::addr const (to make sure the precomputed keyed netgroup doesn't get invalidated) and moved the precomputation to the .cpp file.

@theuni For combining with #8085, the CNode::CalculateKeyedNetGroup()'s k0 and k1 belong inside ConnMan, and perhaps the CNode::nKeyedNetGroup values do too (to prevent storing ConnMan-dependent information inside CNode).

@TheBlueMatt I've expanded the comments in CSipHasher.

theuni · 2016-05-26T16:20:59Z

@sipa Roger. Seems CalculateKeyedNetGroup isn't being called, though. I assume it's meant to be called from CNode's ctor?

sipa · 2016-05-26T16:57:56Z

@theuni Nice catch, it got lost in code movement. I've turned nKeyedNetGroup into a const as well, initialized in the ctor.

theuni · 2016-05-27T15:18:27Z

src/net.h

@@ -9,6 +9,8 @@
 #include "amount.h"
 #include "bloom.h"
 #include "compat.h"
+#include "crypto/common.h"


includes still needed?

theuni · 2016-05-27T16:16:17Z

utACK (excluding the siphash impl itself, which i'm not qualified to review) either way. I don't see the harm in siphash (see previous disclaimer, though), since the input is at most 32bits anyway. But I agree with @sipa that the hash type shouldn't make much difference anyway once cached.

sipa · 2016-05-28T23:35:12Z

@theuni Even if you don't think you're the right person to review the SipHash code, you're certainly able to review its tests (the values in the unit test come from another implementation).

laanwj · 2016-06-07T13:16:25Z

src/net.cpp

+
+/* static */ uint64_t CNode::CalculateKeyedNetGroup(const CAddress& ad)
+{
+    static uint64_t k0 = 0, k1 = 0;


This code is not thread-safe - does it matter?

Otherwise maybe make this a static instance of a structure with the initialization code in the constructor, and C++11 semantics will make sure it will only get initialized once in a thread-safe way.

It doesn't matter (the old code wasn't thread safe either), but better use good practices, and using static initializers is trivial. Fixed.

Lazy calculate vchKeyedNetGroup in CNode::GetKeyedNetGroup.

laanwj · 2016-06-08T13:09:34Z

I've ported the SIPhash test to python using https://github.com/majek/pysiphash , source is here: https://gist.github.com/laanwj/b292fedecf6029fc5307968b965e3366

However I get mismatching results:

OK b''
OK b'00'
Mismatch for b'01020304050607': 15dd418547d24915 versus 93f5f5799a932462
Mismatch for b'08090a0b0c0d0e0f': 242272d800a348b4 versus 3f2acc7f57c29bdb
Mismatch for b'1011': 6307967a77964b0c versus 4bc1b3f0968dd39c
Mismatch for b'12131415161718191a': c5b1fd856729544f versus 2f2e6163076bcfad
Mismatch for b'1b1c1d1e1f': 7a789bd84a5c633b versus 7127512f72f27cce
Mismatch for b'2021222324252627': ad2b02a542ea7faa versus 0e3ea96b5304a7d0
Mismatch for b'28292a2b2c2d2e2f': 3c24a11813c26e21 versus e612a3cb9ecba951
OK b'000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f'

As some of the results do match, I'm not sure whether I made a mistake in my conversion, or whether there's a bug in either bitcoin core's or that python implementation.

laanwj · 2016-06-08T13:12:07Z

Oh, found the issue: I assumed .Finalize would finalize and reset the hasher. It just returns the current hash value, allowing further calls to continue. When I pass through the previous value, it works fine. Updated https://gist.github.com/laanwj/b292fedecf6029fc5307968b965e3366

utACK 8884830

Add full test vectors from spec, test per byte and per 8 bytes. Builds on bitcoin#8086.

sipa · 2016-06-08T13:55:05Z

Superseded by #8173.

Add full test vectors from spec, test per byte and per 8 bytes. Builds on bitcoin#8086.

TheBlueMatt reviewed May 23, 2016
View reviewed changes

pstratem mentioned this pull request May 23, 2016

Avoid recalculating vchKeyedNetGroup in eviction logic. #8088

Closed

jonasschnelli added the P2P label May 23, 2016

sipa force-pushed the moresiphash branch from 2e09933 to 7e5e313 Compare May 25, 2016 13:40

sipa force-pushed the moresiphash branch from 7e5e313 to fbed73e Compare May 26, 2016 16:56

sipa force-pushed the moresiphash branch from fbed73e to 280e979 Compare May 26, 2016 18:16

sipa mentioned this pull request May 26, 2016

Addrman offline attempts #8065

Merged

theuni reviewed May 27, 2016
View reviewed changes

sipa force-pushed the moresiphash branch from 280e979 to 9d1da31 Compare May 28, 2016 23:33

laanwj reviewed Jun 7, 2016
View reviewed changes

pstratem and others added 2 commits June 7, 2016 16:20

Avoid recalculating vchKeyedNetGroup in eviction logic.

053930f

Lazy calculate vchKeyedNetGroup in CNode::GetKeyedNetGroup.

Support SipHash with arbitrary byte writes

9bf156b

sipa added 2 commits June 7, 2016 16:20

Use 64-bit SipHash of netgroups in eviction

c31b24f

Use C++11 thread-safe static initializers

8884830

sipa force-pushed the moresiphash branch from 9d1da31 to 8884830 Compare June 7, 2016 14:32

laanwj added a commit to laanwj/bitcoin that referenced this pull request Jun 8, 2016

test: Add more test vectors for siphash

eebc232

Add full test vectors from spec, test per byte and per 8 bytes. Builds on bitcoin#8086.

laanwj mentioned this pull request Jun 8, 2016

Use SipHash for node eviction (cont'd) #8173

Merged

sipa closed this Jun 8, 2016

rebroad pushed a commit to rebroad/bitcoin that referenced this pull request Dec 7, 2016

test: Add more test vectors for siphash

202b149

Add full test vectors from spec, test per byte and per 8 bytes. Builds on bitcoin#8086.

lateminer pushed a commit to lateminer/bitcoin that referenced this pull request Oct 18, 2018

test: Add more test vectors for siphash

0c6f5bd

Add full test vectors from spec, test per byte and per 8 bytes. Builds on bitcoin#8086.

bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021

Use SipHash for node eviction #8086

Use SipHash for node eviction #8086

Uh oh!

Conversation

sipa commented May 22, 2016

Uh oh!

gmaxwell commented May 22, 2016

Uh oh!

TheBlueMatt May 23, 2016

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt commented May 23, 2016

Uh oh!

pstratem commented May 23, 2016

Uh oh!

gmaxwell commented May 23, 2016

Uh oh!

TheBlueMatt commented May 23, 2016

Uh oh!

gmaxwell commented May 23, 2016

Uh oh!

pstratem commented May 23, 2016

Uh oh!

gmaxwell commented May 23, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pstratem commented May 23, 2016

Uh oh!

gmaxwell commented May 24, 2016

Uh oh!

pstratem commented May 24, 2016

Uh oh!

TheBlueMatt commented May 24, 2016

Uh oh!

gmaxwell commented May 24, 2016

Uh oh!

sipa commented May 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

theuni commented May 26, 2016

Uh oh!

sipa commented May 26, 2016

Uh oh!

theuni May 27, 2016

Choose a reason for hiding this comment

Uh oh!

sipa May 28, 2016

Choose a reason for hiding this comment

Uh oh!

theuni commented May 27, 2016

Uh oh!

sipa commented May 28, 2016

Uh oh!

laanwj Jun 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sipa Jun 7, 2016

Choose a reason for hiding this comment

Uh oh!

laanwj commented Jun 8, 2016

Uh oh!

laanwj commented Jun 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sipa commented Jun 8, 2016

Uh oh!

Uh oh!

gmaxwell commented May 23, 2016 •

edited

Loading

sipa commented May 25, 2016 •

edited

Loading

laanwj Jun 7, 2016 •

edited

Loading

laanwj commented Jun 8, 2016 •

edited

Loading