Faster hash_func. No need for salts. #41

suranap · 2012-08-16T19:38:04Z

The performance bottleneck is in calling the hash function repeatedly. The solution is to call it less frequently. This paper (http://www.eecs.harvard.edu/~michaelm/postscripts/rsa2008.pdf) says you can replace the expensive calls to a hash function with the equation: gi(x) = h1(x) + i * h2(x). Both h1 and h2 are generated by one call to the hash function. Then you just loop around that equation to generate all the hash values you need for the bloom filter. In my limited testing, performance improves about 20%.

mreiferson · 2012-08-16T19:46:48Z

@suranap - Thanks for this. We're going to do some testing and validation to confirm your proposed changes.

Also, it seems like you copied some commits from master onto this branch. My suggestion in terms of easiest way to clean this up would be to do a git rebase -i bitly/master and delete all the commits except for the last two (and then force push the result onto your master.)

suranap · 2012-08-16T20:23:48Z

@mreiferson: Sorry, I'm still new to git & github. Did I do it right? If not, you can grab the changes and apply them yourself. Git is giving me a headache.

mreiferson · 2012-08-16T20:24:40Z

no worries... looks great, thanks.

vmg · 2012-08-19T03:49:45Z

Hey! I've been reading up on lineal combination this whole last week, and was going to open this same PR. Nice to see that @suranap was faster than me.

Here are some ramblings you may find interesting:

I got very excited initially about the idea of saving hash time by using linear combination. However, there's not enough Adderall in all of Europe to let me finish reading the first paragraph of 0. I'm going to spare you my rant on the current state of Academia, but the math is fucking intentionally dense (specially for such un-intuitive results), and the graphed data is pants-on-head retarded.

...So I just went ahead and tried to reproduce all the results on my own -- because that's really the only way I'd trust this math enough to run it on production. My plan was simple: I modified the SMhasher test suite so it would run uniform distribution and bias tests on values generated through lineal combination, and compared these against the distribution results for values generated through randomized seeding.

(No test hash given on command line, testing Murmur3_x86_32.)
-------------------------------------------------------------------------------
--- Testing Murmur3A (MurmurHash3 for x86, 32-bit)

[[[ Keyset 'Linear Combination' Tests ]]]

Keyset 'Linear Combinations' - 1000000 keys
Testing collisions   - Expected   116.42, actual     0.00 ( 0.00x)
Testing distribution - 
[XXXXXXXXXX]
[..........]
[..........]
[..........]
[..........]
[..........]
[..........]
[..........]
[..........]
[..........]
[..........]
[..........]
[..........]
[..........]
[..........]
[..........]
[X.........]
[XX........]
[XXX.......]
[XXXX......]
[XXXXX.....]
[XXXXXX....]
[XXXXXXX...]
[XXXXXXXX..]
[XXXXXXXXX.]
[XXXXXXXXXX]
[XXXXXXXXXX]
[XXXXXXXXXX]
[XXXXXXXXXX]
[XXXXXXXXXX]
[XXXXXXXXXX]
[XXXXXXXXXX]
Worst bias is the   8-bit window at bit  26 - 49.994% !!!!! 



Input vcode 0xa3b0dc0c, Output vcode 0xa1f6345a, Result vcode 0x00000001
Verification value is 0x00000001 - Testing took 0.692099 seconds
-------------------------------------------------------------------------------

This is... well. It puts me at ease, to be fair. The math checks out, and running these distribution tests locally also answers some practical implementation questions that the paper obviously doesn't consider. E.g. whether you can can split the output of a given hash function in half and consider these two values to be independent for linear combination. The answer is yes, by the way. There's no statistical difference between:

Using two completely different hash functions
Using the same hash function twice with different seeds
Using a hash function once and splitting its output

_HOWEVER_: The code in this PR is, in fact, not that much faster than the original! Have you ran benchmarks?

Take a look at the size of your nfunc (i.e. the number of independent hashes required). You'll see that for many corpuses of test data (e.g. a dictionary file, a list of SHA1s), nfunc is surprisingly low (between 3 and 5). That means that the original implementation rarely had to hash more than once, given that Murmur has 128-bit output, enough for 4 independent hashes.

I've had very good results on my local benchmark by bringing back CityHash instead of Murmur (yes, this is karma biting me in the neck). Note that in this PR we're wasting half of the output size of Murmur (we're only combining the two initial 32-bit words, but we've computed 4 of them!). City, on the other hand, goes straight into 2-words and is special-cased for short strings. Basically, the same reasons that made it suck before (when we had a lot of different seeds) make it work great in this new situation.

I'd strongly advise bringing back CityHash if you're going to go with linear combination. Otherwise the performance boost from this PR is not big enough to warrant the (possible?) decrease in accuracy.

Oh, and one more note: Have you made sure that GCC is actually vectorizing the lineal product out of the loop? You may want to perform the optimization by hand just to be 100% sure there's no multiplication anywhere in the loop:

    for (i = 0; i < bloom->nfuncs; i++) {
        hashes[i] = roots[1] % bloom->counts_per_func;
        roots[1] += roots[3];
    }

Alright, I'm off to keep reading literature on this shit. Have you guys noticed that when you change the hash function to output an arbitrary number of bytes from the original digest through lineal combination, you're effectively creating a PRNG? I wonder if the composed values could pass a DieHard test...

Fucking computers man. They be cray.

ploxiln · 2012-08-20T15:32:45Z

I have a commit on a branch (not yet in a pull request) that enables the -O2 option to GCC (optimization level) and tweaks the code in a few places to get rid of warnings GCC wants to emit as a result of the options that enables.

Maybe I should break that out into something minimal we can merge before this, because it could affect the slight performance improvements we get here.

Bitly is quite open to performance improvements, but is more interested in the code reduction and simplification this gives - if this branch gives the same speed and the same accuracy, we're still interested. I plan to work on a couple new tests, and compare the accuracy before and after this "linear combination" changeset.

ploxiln · 2012-08-22T21:04:56Z

I did my own rough comparison, between my "more_tests_stuff" branch (which is in pull request #44) and a branch where I merged that and this branch. Below, I call the former "ploxiln" and the latter "suranap"

I ran my tests using 3 different word lists: "finnish" words, words I scraped from system libraries ("words3"), and both concatenated together ("words4"). Further, in the "words3" case I reduce capacity to 30,000, and in a second "words4" case I reduce error rate to 0.01.

test name         accuracy Test A Test B    Time     1      2      3
================           ====== ======          ====== ====== ======
ploxiln words4             0.0428 0.0937          2.269s 2.261s 2.279s
suranap words4             0.0428 0.0940          1.961s 2.008s 1.936s

ploxiln finnish            0.0412 0.0762          1.133s 1.165s 1.182s
suranap finnish            0.0411 0.0760          0.993s 1.013s 1.037s

ploxiln words3 30000       0.0433 0.1060          1.754s 1.721s 1.685s
suranap words3 30000       0.0446 0.1052          1.396s 1.391s 1.362s

ploxiln words4 0.01        0.0062 0.0197          2.685s 2.716s 2.710s
suranap words4 0.01        0.0063 0.0193          2.311s 2.297s 2.431s

It looks like this changeset makes it slightly faster in all cases, and the accuracy is a wash, and the code simplification is a win. I vote we merge it. We can consider switching to cityhash after that.

@suranap please rebase this on the latest bitly/master, there's a conflict you have to resolve manually, but it's pretty easy, just delete new_salts()

mreiferson · 2012-08-23T15:38:02Z

@suranap thanks, one last request, can you squash this into one commit please?

Faster hash_func. No need for salts.

ploxiln mentioned this pull request Aug 18, 2012

Some performance tweaks #19

Closed

Fast hashing by Kirsch, Mitzenmacher [2006].

2ba6495

ploxiln added a commit that referenced this pull request Aug 24, 2012

Merge pull request #41 from suranap/master

0c751fb

Faster hash_func. No need for salts.

ploxiln merged commit 0c751fb into bitly:master Aug 24, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Faster hash_func. No need for salts. #41

Faster hash_func. No need for salts. #41

Uh oh!

suranap commented Aug 16, 2012

Uh oh!

mreiferson commented Aug 16, 2012

Uh oh!

suranap commented Aug 16, 2012

Uh oh!

mreiferson commented Aug 16, 2012

Uh oh!

vmg commented Aug 19, 2012

Uh oh!

ploxiln commented Aug 20, 2012

Uh oh!

ploxiln commented Aug 22, 2012

Uh oh!

mreiferson commented Aug 23, 2012

Uh oh!

Uh oh!

Faster hash_func. No need for salts. #41

Faster hash_func. No need for salts. #41

Uh oh!

Conversation

suranap commented Aug 16, 2012

Uh oh!

mreiferson commented Aug 16, 2012

Uh oh!

suranap commented Aug 16, 2012

Uh oh!

mreiferson commented Aug 16, 2012

Uh oh!

vmg commented Aug 19, 2012

Uh oh!

ploxiln commented Aug 20, 2012

Uh oh!

ploxiln commented Aug 22, 2012

Uh oh!

mreiferson commented Aug 23, 2012

Uh oh!

Uh oh!