Skip to content

HashSet is slow #26

@natir

Description

@natir

Hello,

During my small reading of rasusa code I see a possible speed up, but before made a pull request I prefer discuss it because we have multiple choices with different draw back.

Rasusa use a HashSet to store indices of selected read, we could find faster solution.

By default the rust hashing algorithm is resistance against HashDoS attacks, this could be useful in some situation but I think it's not the case for rasusa. This resistance have a cost in term of computation time, you could replace standard HashSet by rustc_hash and fxhash this crates provide drop in replacement Set struct but they are faster.

In fact we can replace HashSet by a Vec<bool> with a true or false if the current indices is selected or not. This way have an higher memory cost (1 bytes per indices), but I'm pretty sure this solution is faster. To reduce memory usage to 1 bit per indices we can use crate bitvec.

What is your opinion on this trouble and possible solution, if you want I can write benchmark to compare these solutions (I like benchmark).

Best

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions