RandomState has too many collisions in low order bits when hashing a u64

See [my repro repository](https://github.com/waywardgeek/ahash_collisions).

I just happened to get a bad random seed the first time I ever used ahash, and the collision rate in my hash table was far higher than expected.  When inserting entries into a swiss hash table (like hashbrown) with 50% load factor (half of all spots filled), we expect that on average, we will see 2.541 filled slots together.  The higher this number, the more slots we have to check when looking for a key in the table.

The bad news is ahash for this particular seed has over 4 adjacent filled slots on average.  This slows lookups significantly, especially if we have to check if the keys match for every entry.

For comparison, try running with 2 rounds f hashing, by calling hasher.hash_one(hasher.hash_one(i)).  In this case we get the expected 2.541.  Instead of using two rounds of ahash::RandomState hashing, I use the following hash function which is faster than one round of ahash, and more evenly distributes keys:

```rust
fn hash_u64(v: u64, hash_secret: u64) -> u64 {
    let v1 = u64::wrapping_mul((v ^ hash_secret) ^ (v >> 32), 0x9d46_0858_ea81_ac79);
    u64::wrapping_add(v1 ^ v, v1 >> 32)
}
```

This function is motivated from work I did on [HighwayHash](https://github.com/minio/highwayhash).  The main takeaway is that multiplication is our best mixing primitive on modern CPUs.  The constant is from /dev/random, and just needs to be odd.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RandomState has too many collisions in low order bits when hashing a u64 #210

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

RandomState has too many collisions in low order bits when hashing a u64 #210

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions