Skip to content

Conversation

sipa
Copy link
Member

@sipa sipa commented May 22, 2016

No description provided.

@gmaxwell
Copy link
Contributor

utACK

int count;

public:
CSipHasher(uint64_t k0, uint64_t k1);
CSipHasher& Write(uint64_t data);
CSipHasher& Write(const unsigned char* data, size_t size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably note somewhere that the two Write methods are partially-mutually-exclusive per-object.

@TheBlueMatt
Copy link
Contributor

This is one of those things where having a cryptographic hash function probably isnt /critical/, but is really preferable. Is the speed of SHA256 really slow enough to matter here afer you accepted a new TCP connection (and all associated OS overhead of doing so)?

@pstratem
Copy link
Contributor

nACK

This really does need to be a cryptographic hash function.

The performance overhead of opening the connection is almost certainly many many times the cost of this comparison.

@gmaxwell
Copy link
Contributor

Siphash is a cryptographic function with all the properties we would desire here: https://eprint.iacr.org/2014/722

It is generally suitable any place a cryptographic hash would be used where the small output size wouldn't be an issue. This is a fine example of such a location, similar to hash tables where the table's smallness makes a larger hash output irrelevant, with only a few hundred peers a 64 bit hash will reliably distinguish them.

@TheBlueMatt
Copy link
Contributor

In principal, yes, SipHash was designed to be effectively a cryptographic hash with small output size, but I don't want to fall into the trap of calling something a cryptographic hash when it has only had one or two cryptanalyses published. If we care about the performance difference here, I'd say it's fine, but I'm not sure that we do?

On May 22, 2016 10:19:47 PM PDT, Gregory Maxwell notifications@github.com wrote:

Siphash is a cryptographic function with all the properties we would
desire here: https://eprint.iacr.org/2014/722

It is generally suitable any place a cryptographic hash would be used
where the small output size wouldn't be an issue. This is a fine
example of such a location, similar to hash tables where the table's
smallness makes a larger hash output irrelevant, with only a few
hundred peers a 64 bit hash will reliably distinguish them.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#8086 (comment)

@gmaxwell
Copy link
Contributor

Keep in mind that the input is also very small, and the keyspace is very large. It's likely that this function is nearly a keyed permutation for inputs this small.

As far as performance, with 125 peers and sha256 here it will take about 2,632,000 cycles to run the hashes here. That is about 2ms cpu per inbound connection. It's not negligible.

(Though sipa should perhaps change it to run the hashing in the pre-processing step instead of in the comparison for a 2*log2(n) speedup-- or better, store it in cnode and compute it on connect to make it O(1) in the number of operations per disconnect.)

@pstratem
Copy link
Contributor

@gmaxwell I'll do a fix that still uses SHA256

@gmaxwell
Copy link
Contributor

gmaxwell commented May 23, 2016

Please walk me through the attack you're imagining here. There are 16 bits of network groups for IPv4, 32 bits for IPv6. In both cases many of the possible values aren't routable on the internet.. Each node has it's own 128-bit secret random seed. The node prefers to keep four hosts based on how their netgroup is ordered by a salted hash. The attacker does X and achieves Y. Help me out with the blanks here. :)

@pstratem
Copy link
Contributor

@gmaxwell I don't know but with #8088 the performance difference is going to be basically not measurable.

It's now reduced to the difference between a single SHA256 vs SipHash and comparison between 256 bits vs 64 bits.

@gmaxwell
Copy link
Contributor

Come one guys, you can't throw up a bunch of stop signs for security then fail to actually walk me through your security analysis. We want security but not security cargo cult.

Beyond performance there is a consistency point here, we should use the same cryptographic hash-function in all the cases where we need some non-normative small hash, unless there is a good reason not to. If there is a reason it's not suitable here, then its likely not suitable in other cases either.

@pstratem
Copy link
Contributor

@gmaxwell If this fails it potentially harms the partition resistance of the network.

If the other uses of SipHash fail they at worst are a denial of service attack (which existed before SipHash).

Given the extremely small performance difference compared to the enormous cost of accepting a new connection I just don't see the (admittedly small) risk being justified.

@TheBlueMatt
Copy link
Contributor

If there is no performance difference, why change it? Might as well use a stronger hash anywhere we can if its free.

@gmaxwell
Copy link
Contributor

Again, please actually specify the attack. I gave you a template, fill it out.

If the other uses of SipHash fail they at worst are a denial of service attack

The usage in addrman is functionally identical to this.

if there is no performance difference

in this PR it's a pretty substantial performance difference (it's about 2ms per connection), though performance can be achieved in other ways that I pointed out.

Might as well use a stronger hash

On what basis do you believe that another construction would be stronger?

As an aside-- right now the behavior of this is kind of busted. It orders peers by their netgroup and protects four, but it makes no effort to make the four be from different netgroups, which means there is an obvious attack strategy of running four hosts per netgroup in a lots of different netgroups. To avoid that, it should (e.g.) sort by netgroup hash, lastblocktime>0, connection uptime and skip over protecting peers that have the same netgroup as peers that were already protected.

It's more than a little disappointing to see furious hand-waving about the hash function when the basic functionality isn't getting anywhere near that amount of attention.

@sipa
Copy link
Member Author

sipa commented May 25, 2016

I think you're all exaggerating:

  • @pstratem @TheBlueMatt SipHash is more than sufficient in this case (hell, multiplying the vchGroup (interpreted as a number) by a random odd 64-bit integer likely already results in a sufficiently unpredictable permutation).
  • @gmaxwell After caching at the CNode level, performance of the hash function is irrelevant. I think SipHash is more appropriate, but SHA256 is certainly not inappropriate.

So let's just combine the approaches.

I've included #8088 in this PR, and made a few more changes (= make the whole eviction logic work using nKeyedNetGroup, rather than only in the comparison). I've also made CNode::addr const (to make sure the precomputed keyed netgroup doesn't get invalidated) and moved the precomputation to the .cpp file.

@theuni For combining with #8085, the CNode::CalculateKeyedNetGroup()'s k0 and k1 belong inside ConnMan, and perhaps the CNode::nKeyedNetGroup values do too (to prevent storing ConnMan-dependent information inside CNode).

@TheBlueMatt I've expanded the comments in CSipHasher.

@theuni
Copy link
Member

theuni commented May 26, 2016

@sipa Roger. Seems CalculateKeyedNetGroup isn't being called, though. I assume it's meant to be called from CNode's ctor?

@sipa
Copy link
Member Author

sipa commented May 26, 2016

@theuni Nice catch, it got lost in code movement. I've turned nKeyedNetGroup into a const as well, initialized in the ctor.

@@ -9,6 +9,8 @@
#include "amount.h"
#include "bloom.h"
#include "compat.h"
#include "crypto/common.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

includes still needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@theuni
Copy link
Member

theuni commented May 27, 2016

utACK (excluding the siphash impl itself, which i'm not qualified to review) either way. I don't see the harm in siphash (see previous disclaimer, though), since the input is at most 32bits anyway. But I agree with @sipa that the hash type shouldn't make much difference anyway once cached.

@sipa
Copy link
Member Author

sipa commented May 28, 2016

@theuni Even if you don't think you're the right person to review the SipHash code, you're certainly able to review its tests (the values in the unit test come from another implementation).


/* static */ uint64_t CNode::CalculateKeyedNetGroup(const CAddress& ad)
{
static uint64_t k0 = 0, k1 = 0;
Copy link
Member

@laanwj laanwj Jun 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is not thread-safe - does it matter?

Otherwise maybe make this a static instance of a structure with the initialization code in the constructor, and C++11 semantics will make sure it will only get initialized once in a thread-safe way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter (the old code wasn't thread safe either), but better use good practices, and using static initializers is trivial. Fixed.

pstratem and others added 2 commits June 7, 2016 16:20
@laanwj
Copy link
Member

laanwj commented Jun 8, 2016

I've ported the SIPhash test to python using https://github.com/majek/pysiphash , source is here: https://gist.github.com/laanwj/b292fedecf6029fc5307968b965e3366

However I get mismatching results:

OK b''
OK b'00'
Mismatch for b'01020304050607': 15dd418547d24915 versus 93f5f5799a932462
Mismatch for b'08090a0b0c0d0e0f': 242272d800a348b4 versus 3f2acc7f57c29bdb
Mismatch for b'1011': 6307967a77964b0c versus 4bc1b3f0968dd39c
Mismatch for b'12131415161718191a': c5b1fd856729544f versus 2f2e6163076bcfad
Mismatch for b'1b1c1d1e1f': 7a789bd84a5c633b versus 7127512f72f27cce
Mismatch for b'2021222324252627': ad2b02a542ea7faa versus 0e3ea96b5304a7d0
Mismatch for b'28292a2b2c2d2e2f': 3c24a11813c26e21 versus e612a3cb9ecba951
OK b'000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f'

As some of the results do match, I'm not sure whether I made a mistake in my conversion, or whether there's a bug in either bitcoin core's or that python implementation.

@laanwj
Copy link
Member

laanwj commented Jun 8, 2016

Oh, found the issue: I assumed .Finalize would finalize and reset the hasher. It just returns the current hash value, allowing further calls to continue. When I pass through the previous value, it works fine. Updated https://gist.github.com/laanwj/b292fedecf6029fc5307968b965e3366

utACK 8884830

laanwj added a commit to laanwj/bitcoin that referenced this pull request Jun 8, 2016
Add full test vectors from spec, test per byte and per 8 bytes.

Builds on bitcoin#8086.
@sipa
Copy link
Member Author

sipa commented Jun 8, 2016

Superseded by #8173.

@sipa sipa closed this Jun 8, 2016
rebroad pushed a commit to rebroad/bitcoin that referenced this pull request Dec 7, 2016
Add full test vectors from spec, test per byte and per 8 bytes.

Builds on bitcoin#8086.
lateminer pushed a commit to lateminer/bitcoin that referenced this pull request Oct 18, 2018
Add full test vectors from spec, test per byte and per 8 bytes.

Builds on bitcoin#8086.
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants