Skip to content

Conversation

ctb
Copy link
Contributor

@ctb ctb commented May 1, 2025

This PR implements manifest retrieval from Rust via FFI, and implements the relevant machinery for RevIndex classes.

Starts from code in #2726, which turns out to be pretty far along!

Tackles #3593

Copy link

codecov bot commented May 1, 2025

Codecov Report

❌ Patch coverage is 48.61111% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.33%. Comparing base (994a410) to head (199745e).
⚠️ Report is 59 commits behind head on latest.

Files with missing lines Patch % Lines
src/core/src/ffi/manifest.rs 0.00% 37 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           latest    #3630      +/-   ##
==========================================
- Coverage   88.44%   88.33%   -0.12%     
==========================================
  Files         136      137       +1     
  Lines       23401    23467      +66     
  Branches     2269     2270       +1     
==========================================
+ Hits        20697    20729      +32     
- Misses       2394     2428      +34     
  Partials      310      310              
Flag Coverage Δ
hypothesis-py 25.44% <11.76%> (-0.05%) ⬇️
python 92.62% <100.00%> (+0.01%) ⬆️
rust 82.72% <2.63%> (-0.28%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ctb
Copy link
Contributor Author

ctb commented May 1, 2025

@luizirber any commentary/suggestions on the code I rescued, and in particular on SourmashManifestRowIter; and the comment FIXME: free mem from strings? would be most welcome :). So far it seems to be working pretty well, which is really nice, but I haven't used the #[repr(C)] approach for FFI in Rust before.

@ctb
Copy link
Contributor Author

ctb commented May 1, 2025

(There's a seg fault in there that I don't quite understand, too :)

@ctb ctb changed the title WIP: implement manifest retrieval from Rust via FFI for RevIndex MRG: implement manifest retrieval from Rust via FFI for RevIndex May 2, 2025
@ctb
Copy link
Contributor Author

ctb commented May 2, 2025

Ready for review @luizirber @bluegenes !

@luizirber luizirber self-requested a review May 6, 2025 18:04
Copy link
Member

@luizirber luizirber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main question: the goal here is to modify a copy of the revindex Manifest, and potentially replace it, NOT to edit in place?

I think that's fine, and most manifest operations are not that demanding, but might be quite a bit of memory to create two copies (one in Rust, one in Python) for large ones like the SRA metagenomes.

(and probably the best we can do at the moment, without thinking deeper into how ownership works across Python/Rust...)

@ctb
Copy link
Contributor Author

ctb commented May 6, 2025

Main question: the goal here is to modify a copy of the revindex Manifest, and potentially replace it, NOT to edit in place?

I think that's fine, and most manifest operations are not that demanding, but might be quite a bit of memory to create two copies (one in Rust, one in Python) for large ones like the SRA metagenomes.

(and probably the best we can do at the moment, without thinking deeper into how ownership works across Python/Rust...)

yes to all of the above 😆 . More will come, but this is minimally necessary for decent RocksDB inspection and picklisting...

@ctb ctb merged commit 82a7a85 into latest May 6, 2025
40 of 43 checks passed
@ctb ctb deleted the rs_manifest branch May 6, 2025 23:22
ctb added a commit that referenced this pull request May 8, 2025
# sourmash v4.9.0 release notes

This release adds two significant feature sets to sourmash, without
introducing any breaking changes.

First, sourmash now fully supports fast, low-memory disk-based inverted
indexes based on RocksDB. This functionality has been part of [the
branchwater
plugin](https://github.com/sourmash-bio/sourmash_plugin_branchwater) for
a while, but it is now accessible via the sourmash command line and
Python API.

Second, we have added skip-mer sketching to sourmash, joining DNA,
protein, dayhoff, and hp encodings. Skip-mers allow more mismatches than
DNA k-mers and can be useful when comparing fast-evolving sequences such
as virus and phage genomes.

Documentation for the RocksDB indexes and skip-mer encodings is
available in the [command-line
docs](https://sourmash.readthedocs.io/en/latest/command-line.html).

Major new features:

* Fully support fast, low-memory RocksDB indexes in Python (#3545)
* Fully support skip-mers at the Python level; provide documentation
(#3627)
* Remove support for python 3.10 (#3606)

Cleanup and documentation updates:

* add default to `add_scaled_arg` in Python CLI utils (#3609)
* use `match/case` in `sourmash index` implementation (#3604)
* use single quotes inside sqlite statements (#3556)

Developer updates:

* implement manifest retrieval from Rust via FFI for `RevIndex` (#3630)
* make the RocksDB handle directly accessible to external code (#3468)
* fix linear gather in Rust (#3605)
* fix beta clippy errors (#3548)
* fix deprecations (#3613)
* update Makefile with 'offline', 'wheel' (#3579)
* update ubuntu image version for CI (#3623)
* Minhash deserialize hashfunction errorhandling (#3560)

Automated updates:

* Bump DeterminateSystems/nix-installer-action from 16 to 17 (#3626)
* Bump getset from 0.1.4 to 0.1.5 (#3567)
* Bump histogram from 0.11.2 to 0.11.3 (#3574)
* Bump log from 0.4.25 to 0.4.26 (#3549)
* Bump log from 0.4.26 to 0.4.27 (#3587)
* Bump needletail from 0.6.1 to 0.6.3 (#3553)
* Bump prefix-dev/setup-pixi from 0.8.1 to 0.8.2 (#3538)
* Bump prefix-dev/setup-pixi from 0.8.2 to 0.8.3 (#3551)
* Bump prefix-dev/setup-pixi from 0.8.3 to 0.8.4 (#3602)
* Bump prefix-dev/setup-pixi from 0.8.4 to 0.8.7 (#3616)
* Bump prefix-dev/setup-pixi from 0.8.7 to 0.8.8 (#3621)
* Bump pypa/cibuildwheel from 2.22.0 to 2.23.0 (#3564)
* Bump pypa/cibuildwheel from 2.23.0 to 2.23.1 (#3581)
* Bump pypa/cibuildwheel from 2.23.1 to 2.23.2 (#3603)
* Bump pypa/cibuildwheel from 2.23.2 to 2.23.3 (#3625)
* Bump rand from 0.9.0 to 0.9.1 (#3620)
* Bump roaring from 0.10.10 to 0.10.12 (#3608)
* Bump serde from 1.0.217 to 1.0.218 (#3550)
* Bump serde from 1.0.218 to 1.0.219 (#3576)
* Bump serde_json from 1.0.138 to 1.0.139 (#3552)
* Bump serde_json from 1.0.139 to 1.0.140 (#3566)
* Bump tempfile from 3.16.0 to 3.17.1 (#3539)
* Bump tempfile from 3.17.1 to 3.18.0 (#3575)
* Bump tempfile from 3.18.0 to 3.19.0 (#3582)
* Bump tempfile from 3.19.0 to 3.19.1 (#3588)
* Bump thiserror from 2.0.11 to 2.0.12 (#3565)
* pre-commit autoupdate (#3547)
* pre-commit autoupdate (#3563)
* pre-commit autoupdate (#3573)
* pre-commit autoupdate (#3580)
* pre-commit autoupdate (#3586)
* pre-commit autoupdate (#3607)
* pre-commit autoupdate (#3615)
* pre-commit autoupdate (#3619)
* pre-commit autoupdate (#3624)
* pre-commit autoupdate (#3633)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants