Skip to content

RandomState has too many collisions in low order bits when hashing a u64 #210

@waywardgeek

Description

@waywardgeek

See my repro repository.

I just happened to get a bad random seed the first time I ever used ahash, and the collision rate in my hash table was far higher than expected. When inserting entries into a swiss hash table (like hashbrown) with 50% load factor (half of all spots filled), we expect that on average, we will see 2.541 filled slots together. The higher this number, the more slots we have to check when looking for a key in the table.

The bad news is ahash for this particular seed has over 4 adjacent filled slots on average. This slows lookups significantly, especially if we have to check if the keys match for every entry.

For comparison, try running with 2 rounds f hashing, by calling hasher.hash_one(hasher.hash_one(i)). In this case we get the expected 2.541. Instead of using two rounds of ahash::RandomState hashing, I use the following hash function which is faster than one round of ahash, and more evenly distributes keys:

fn hash_u64(v: u64, hash_secret: u64) -> u64 {
    let v1 = u64::wrapping_mul((v ^ hash_secret) ^ (v >> 32), 0x9d46_0858_ea81_ac79);
    u64::wrapping_add(v1 ^ v, v1 >> 32)
}

This function is motivated from work I did on HighwayHash. The main takeaway is that multiplication is our best mixing primitive on modern CPUs. The constant is from /dev/random, and just needs to be odd.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions