-
Notifications
You must be signed in to change notification settings - Fork 127
Description
See my repro repository.
I just happened to get a bad random seed the first time I ever used ahash, and the collision rate in my hash table was far higher than expected. When inserting entries into a swiss hash table (like hashbrown) with 50% load factor (half of all spots filled), we expect that on average, we will see 2.541 filled slots together. The higher this number, the more slots we have to check when looking for a key in the table.
The bad news is ahash for this particular seed has over 4 adjacent filled slots on average. This slows lookups significantly, especially if we have to check if the keys match for every entry.
For comparison, try running with 2 rounds f hashing, by calling hasher.hash_one(hasher.hash_one(i)). In this case we get the expected 2.541. Instead of using two rounds of ahash::RandomState hashing, I use the following hash function which is faster than one round of ahash, and more evenly distributes keys:
fn hash_u64(v: u64, hash_secret: u64) -> u64 {
let v1 = u64::wrapping_mul((v ^ hash_secret) ^ (v >> 32), 0x9d46_0858_ea81_ac79);
u64::wrapping_add(v1 ^ v, v1 >> 32)
}
This function is motivated from work I did on HighwayHash. The main takeaway is that multiplication is our best mixing primitive on modern CPUs. The constant is from /dev/random, and just needs to be odd.