Speedup by replace HashSet
by a Vec<bool>
#28
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello,
According to #26 I create this pull request to replace
HashSet
indices storage by a Vec of boolean.I made a simple benchmark by keeping 250x of 500x of nanopore simulated reads of E. coli k12, with a fixed seed (30 samples).
I Indicate in issue #26
HashSet
operation take 10 % of rasusa run time (I use same parameter), replaceHashSet
byVec<bool>
save this time.Output are identical.
Some points need discussion:
indices
return aVec<bool>
and the number of read selected (as usize), because we can't get the number of reads selected by get the length of Vecfilter_reads_into
signature to addnb_reads_keep
parameter to keep previous behavior. But with this pull request we iterate over all input reads, so I think we can remove the finalfilter_reads_into
test. What is your opinion about this ?I try to change test less as possible.