Skip to content

Rng::gen_range() is slowest with power-of-2 input sizes? #1145

@elidupree

Description

@elidupree

I was just fiddling around with some RNG stuff when I noticed something that looks pretty odd:

gen_range_plot

These are time-per-call for calls to rng.gen_range(0..n) (you can see the value of n at the end of each name written on the vertical axis), sampled using Criterion. You'd think that power-of-2 range sizes would be the fastest to generate, since they can theoretically be done as just a single bitmask, with no rejections. But instead, they are the slowest!

I'm not deeply familiar with RNG algorithms, but I took a quick look through the code, and I wonder if this line from sample_single_inclusive is the culprit. Maybe it was intended to use (range-1).leading_zeros() rather than range.leading_zeros()? I haven't grokked what the line is doing, but that would be something that affects exactly power-of-2 sizes differently than their neighbors.

To test, this, I confirmed that Uniform::new(0, n).sample(&mut rng) does not have this pattern, and in fact, is faster than rng.gen_range(0..n) on my computer in almost every case I tested (but especially much faster on power-of-2 sizes):

gen_range_report_2

Is it possible that the sample_single_inclusive code is out of date compared to optimizations that have been made more recently in sample_inclusive?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions