-
Notifications
You must be signed in to change notification settings - Fork 117
Vmg perf branch #39
Vmg perf branch #39
Conversation
ready for review @Hines @mreiferson The last commit makes a benchmark I did with 1 million words drop from 3.65 seconds to 3.45 seconds, consistently. (I first wrote a crazy clever version of my own to avoid rounding up the allocation, and it only brought it down to 3.5 seconds, and was crazier...) |
I'd like to squash that last commit into the first ftruncate() commit before merge |
I could also squash the "clean up type of bloom->hashes" commit into the "replace MD5 hash with Murmur" commit, if desired |
/* rounding-up integer divide by 2 of bloom->size */ | ||
bloom->num_bytes = ((bloom->size + 1) / 2) + HEADER_BYTES; | ||
/* round hashes allocation up to a multiple of 4 (for convenience in hash_func()) */ | ||
bloom->hashes = calloc((bloom->nfuncs + (4-1)) & (~(4-1)), sizeof(uint32_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given the fact that this is the init function (and relatively speaking is not a considerable source for optimization) I'm not sure the "cleverness" of the code here is worth it.
the comments help though.
(moved comment to pull req)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. The reason I did it that way was conciseness, and so new_salts()
didn't have to be called first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added another commit to simplify this
I'd say just squash that comment update into the original |
can I get one last nod of approval of my last change before I squash (and I'm more inclined to squash the "type cleanup" into the hash function change now that I think about it) |
instead of using lseek() repeatedly in a loop (original commit by Vicent Marti, rebased and tweaked by Pierce Lopez, any bugs added probably due to the latter)
new hash function should be much faster now salts should be better distributed now clean up type of bloom->hashes and hash_func() (original commit by Vicent Marti, rebased and modified to use the 128-bit hash for salt generation by Pierce Lopez, any bugs added probably due to the latter)
more flexible and safe Also, note that Python strings (and pretty much any dynamic language) already keep length information. This will save us quite a few `strlen` calls. (original commit by Vicent Marti, rebased by Pierce Lopez, any bugs added probably due to the latter)
this way hash_func() can always do 4 hashes at a time simpler loop logic is a tiny bit faster also clean up num_salts handling a bit (original work by Vicent Marti, rebased and pulled out by Pierce Lopez, any bugs are probably due to the latter)
just squashed, ready for merge |
/* rounding-up integer divide by 4 of bloom->nfuncs */ | ||
bloom->num_salts = (bloom->nfuncs + 3) / 4; | ||
/* effectively bloom->nfuncs rounded up to a multiple of 4 */ | ||
bloom->hashes = calloc(bloom->num_salts * 4, sizeof(uint32_t)); | ||
new_salts(bloom); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your last commit looks good.
Also, do you think it makes sense for new_salts
to have the following signature?
uint32_t *new_salts(unsigned int num_salts);
and then, in this function it would be used like:
bloom->salts = new_salts(bloom->num_salts);
Essentially, I don't see any reason for it to have to take a specific structure as an argument. I realize this function is only used once but this makes it a bit more agnostic to the type of bloom filter (otherwise I would argue it should be named counting_bloom_new_salts
or just be brought into counting_bloom_init
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(we can postpone this for your larger cleanup branch)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think I'd like to postpone this
nod |
rebased and squashed the famous #19 from @vmg on recent master
deferred some struct member type and function return type cleanups to a later commit
At least some of the performance improvements are still present. On my machine, with a 731000 word dictionary,
make test goes from 7.1 seconds runtime to 1.7 seconds runtime
make test_pydablooms goes from 8.5 seconds runtime to 3.1 seconds runtime