Releases: bluenote-1577/skani
Releases · bluenote-1577/skani
v0.3.0
v0.3.0 released - 2025-08 (Breaking changes)
Major
- Engineering changes: skani takes ~30-40% less memory than before but is ~5-10% slower. Results should still be identical to before.
- BREAKING: Changed sketching output. Now all sketches are concatenated into a searchable database file by default. Original behavior can be resotored via --separate-files.
- New command line options --both-min-af and --short-header
Minor
- Refactored the commandline backend. Small deviations in commandline behavior may be present, hopefully not big bugs were introduced.
latest
v0.2.2
v0.2.2 released - 2024-07-04
Major
- added the
--small-genomes
preset. This is just an alias for-c 30 -m 200 --faster-small
. This makes skani much faster when comparing hundreds of thousands of small genomes.
Minor
- fixed a bug where
skani triangle --full-matrix
gave different results between STDOUT and-o
(thanks to Florian Plaza Onate) - added a
--diagonal
option (suggested by Antonio Camargo) to print diagonal entries for sparse and lower-triangular distance matrices - added a warning to use
--faster-small
when comparing too many contigs (e.g. viruses, plasmids).
v0.2.1
v0.2.1 released - 2023-10-11
More consistent support for small contigs and sequences.
Major
- --faster-small option included in dist and triangle.
Genomes (and contigs with the --i, --ri, --qi options) with less than 20 marker k-mers are not screened according to the -s option. This has always been the case, but not documented online. This is because screening is not as effective for very small genomes. This makes skani more sensitive for small sequences but can hamper performance on very large datasets with lots of small genomes/contigs.
This heuristic can now be disabled with the --faster-small
option. This can make huge comparisons much faster if you don't care about sensitivity for very small genomes.
Minor
- skani's version is now displayed properly
- Added some error messages for degenerate cases (and more testing)
- We found that the statically built binary can be a lot slower in certain cases. File i/o may be an issue for the binary version. A note is now added in the README.
v0.2.0
v0.2.0 released - 2023-09-26
BREAKING
- --learned-ani feature was buggy before and now removed.
Major
- Major bug found: debiasing for ANI was turned off if there were > 5000 queries present in skani search and skani dist. This bug is fixed now.
Minor
- The rust API is changing in this version. Not published to Cargo yet (waiting on DDOtten/partitions#3 to be published to crates...)
- Version number fixed
v0.1.5
v0.1.5 released - 2023-09-01
Major
Improved "N" character support:
- changed query-reference selection method slightly via a slight hack, using marker seeds to estimate reference length instead. This makes it so NNN characters are not counted.
- Now seeds with "N" characters present are no longer indexed.
Minor
- --robust now uses the learned ANI debiasing procedure by default.
v0.1.4
v0.1.4 released - 2023-06-14
Major
- skani triangle had a bug where if more than 5000 queries were present and --sparse or -E was not specified, the intermediate batch of 5000 queries would be written in sparse mode.
- skani triangle -o was giving different upper triangle matrix instead of lower triangle (skani triangle > res gives lower triangle). Matrices are consistently lower triangle now.
- Changed to lto = true for release mode. I see anywhere from a 5-10% speedup for this.
Minor
- Changed some dependencies so no more dependencies on old crates that will deprecate.
v0.1.3
v0.1.3 released - 2023-05-09
Major
- Fixed a bug where memory was blowing up in
dist
andtriangle
when the marker-index was activated. For big datasets, there could be > 100 GBs of wasted memory. - skani now outputs intermediate results after processing each batch of 5000 queries. This will mean that outputs may no longer be deterministically ordered if there are > 5000 genomes, but you can sort the output file to get deterministic outputs, i.e
skani triangle *.fa | sort -k 3 -n > sorted_skani_result.txt
will guarantee deterministic output order.
Minor
- Changed the marker index hash table population method. Used to overestimate memory usage slightly.
- New help message for marker parameters. Turns out that for small genomes, having more markers may make filtering significantly better.
- Added -i option to sketch so you can sketch individual records in multifastas -- does not work for search yet though, only for sketching.
v0.1.2
v0.1.2 released - 2023-04-28.
Small fixes.
- Added
--medium
pre-set, which is just-c 70
. Seems to work okay for comparing fragmented genomes. - BREAKING: Changed
--marker-index
to--no-marker-index
as a more sane option. - Added
--distance
option toskani triangle
to output distance matrix (i.e. 100 - ANI) instead of similarity matrix. - Misc. help message fixes
v0.1.1
Small tweaks.
- Made aligned fraction matrix a full matrix by default, since aligned fraction is not symmetric.
- Fixed an issue with the static compiled version being too slow