-
-
Notifications
You must be signed in to change notification settings - Fork 473
Description
I was just fiddling around with some RNG stuff when I noticed something that looks pretty odd:
These are time-per-call for calls to rng.gen_range(0..n)
(you can see the value of n
at the end of each name written on the vertical axis), sampled using Criterion. You'd think that power-of-2 range sizes would be the fastest to generate, since they can theoretically be done as just a single bitmask, with no rejections. But instead, they are the slowest!
I'm not deeply familiar with RNG algorithms, but I took a quick look through the code, and I wonder if this line from sample_single_inclusive is the culprit. Maybe it was intended to use (range-1).leading_zeros()
rather than range.leading_zeros()
? I haven't grokked what the line is doing, but that would be something that affects exactly power-of-2 sizes differently than their neighbors.
To test, this, I confirmed that Uniform::new(0, n).sample(&mut rng)
does not have this pattern, and in fact, is faster than rng.gen_range(0..n)
on my computer in almost every case I tested (but especially much faster on power-of-2 sizes):
Is it possible that the sample_single_inclusive
code is out of date compared to optimizations that have been made more recently in sample_inclusive
?