-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Use SipHash for node eviction #8086
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
utACK |
int count; | ||
|
||
public: | ||
CSipHasher(uint64_t k0, uint64_t k1); | ||
CSipHasher& Write(uint64_t data); | ||
CSipHasher& Write(const unsigned char* data, size_t size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably note somewhere that the two Write methods are partially-mutually-exclusive per-object.
This is one of those things where having a cryptographic hash function probably isnt /critical/, but is really preferable. Is the speed of SHA256 really slow enough to matter here afer you accepted a new TCP connection (and all associated OS overhead of doing so)? |
nACK This really does need to be a cryptographic hash function. The performance overhead of opening the connection is almost certainly many many times the cost of this comparison. |
Siphash is a cryptographic function with all the properties we would desire here: https://eprint.iacr.org/2014/722 It is generally suitable any place a cryptographic hash would be used where the small output size wouldn't be an issue. This is a fine example of such a location, similar to hash tables where the table's smallness makes a larger hash output irrelevant, with only a few hundred peers a 64 bit hash will reliably distinguish them. |
In principal, yes, SipHash was designed to be effectively a cryptographic hash with small output size, but I don't want to fall into the trap of calling something a cryptographic hash when it has only had one or two cryptanalyses published. If we care about the performance difference here, I'd say it's fine, but I'm not sure that we do? On May 22, 2016 10:19:47 PM PDT, Gregory Maxwell notifications@github.com wrote:
|
Keep in mind that the input is also very small, and the keyspace is very large. It's likely that this function is nearly a keyed permutation for inputs this small. As far as performance, with 125 peers and sha256 here it will take about 2,632,000 cycles to run the hashes here. That is about 2ms cpu per inbound connection. It's not negligible. (Though sipa should perhaps change it to run the hashing in the pre-processing step instead of in the comparison for a 2*log2(n) speedup-- or better, store it in cnode and compute it on connect to make it O(1) in the number of operations per disconnect.) |
@gmaxwell I'll do a fix that still uses SHA256 |
Please walk me through the attack you're imagining here. There are 16 bits of network groups for IPv4, 32 bits for IPv6. In both cases many of the possible values aren't routable on the internet.. Each node has it's own 128-bit secret random seed. The node prefers to keep four hosts based on how their netgroup is ordered by a salted hash. The attacker does X and achieves Y. Help me out with the blanks here. :) |
Come one guys, you can't throw up a bunch of stop signs for security then fail to actually walk me through your security analysis. We want security but not security cargo cult. Beyond performance there is a consistency point here, we should use the same cryptographic hash-function in all the cases where we need some non-normative small hash, unless there is a good reason not to. If there is a reason it's not suitable here, then its likely not suitable in other cases either. |
@gmaxwell If this fails it potentially harms the partition resistance of the network. If the other uses of SipHash fail they at worst are a denial of service attack (which existed before SipHash). Given the extremely small performance difference compared to the enormous cost of accepting a new connection I just don't see the (admittedly small) risk being justified. |
If there is no performance difference, why change it? Might as well use a stronger hash anywhere we can if its free. |
Again, please actually specify the attack. I gave you a template, fill it out.
The usage in addrman is functionally identical to this.
in this PR it's a pretty substantial performance difference (it's about 2ms per connection), though performance can be achieved in other ways that I pointed out.
On what basis do you believe that another construction would be stronger? As an aside-- right now the behavior of this is kind of busted. It orders peers by their netgroup and protects four, but it makes no effort to make the four be from different netgroups, which means there is an obvious attack strategy of running four hosts per netgroup in a lots of different netgroups. To avoid that, it should (e.g.) sort by netgroup hash, lastblocktime>0, connection uptime and skip over protecting peers that have the same netgroup as peers that were already protected. It's more than a little disappointing to see furious hand-waving about the hash function when the basic functionality isn't getting anywhere near that amount of attention. |
I think you're all exaggerating:
So let's just combine the approaches. I've included #8088 in this PR, and made a few more changes (= make the whole eviction logic work using nKeyedNetGroup, rather than only in the comparison). I've also made CNode::addr const (to make sure the precomputed keyed netgroup doesn't get invalidated) and moved the precomputation to the .cpp file. @theuni For combining with #8085, the CNode::CalculateKeyedNetGroup()'s k0 and k1 belong inside ConnMan, and perhaps the CNode::nKeyedNetGroup values do too (to prevent storing ConnMan-dependent information inside CNode). @TheBlueMatt I've expanded the comments in CSipHasher. |
@sipa Roger. Seems CalculateKeyedNetGroup isn't being called, though. I assume it's meant to be called from CNode's ctor? |
@theuni Nice catch, it got lost in code movement. I've turned nKeyedNetGroup into a const as well, initialized in the ctor. |
@@ -9,6 +9,8 @@ | |||
#include "amount.h" | |||
#include "bloom.h" | |||
#include "compat.h" | |||
#include "crypto/common.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
includes still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
utACK (excluding the siphash impl itself, which i'm not qualified to review) either way. I don't see the harm in siphash (see previous disclaimer, though), since the input is at most 32bits anyway. But I agree with @sipa that the hash type shouldn't make much difference anyway once cached. |
@theuni Even if you don't think you're the right person to review the SipHash code, you're certainly able to review its tests (the values in the unit test come from another implementation). |
|
||
/* static */ uint64_t CNode::CalculateKeyedNetGroup(const CAddress& ad) | ||
{ | ||
static uint64_t k0 = 0, k1 = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is not thread-safe - does it matter?
Otherwise maybe make this a static instance of a structure with the initialization code in the constructor, and C++11 semantics will make sure it will only get initialized once in a thread-safe way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't matter (the old code wasn't thread safe either), but better use good practices, and using static initializers is trivial. Fixed.
Lazy calculate vchKeyedNetGroup in CNode::GetKeyedNetGroup.
I've ported the SIPhash test to python using https://github.com/majek/pysiphash , source is here: https://gist.github.com/laanwj/b292fedecf6029fc5307968b965e3366 However I get mismatching results:
As some of the results do match, I'm not sure whether I made a mistake in my conversion, or whether there's a bug in either bitcoin core's or that python implementation. |
Oh, found the issue: I assumed .Finalize would finalize and reset the hasher. It just returns the current hash value, allowing further calls to continue. When I pass through the previous value, it works fine. Updated https://gist.github.com/laanwj/b292fedecf6029fc5307968b965e3366 utACK 8884830 |
Add full test vectors from spec, test per byte and per 8 bytes. Builds on bitcoin#8086.
Superseded by #8173. |
Add full test vectors from spec, test per byte and per 8 bytes. Builds on bitcoin#8086.
Add full test vectors from spec, test per byte and per 8 bytes. Builds on bitcoin#8086.
No description provided.