Skip to content

Use Hannoy instead of arroy #5767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 16 commits into
base: release-v1.16.0
Choose a base branch
from
Draft

Conversation

Kerollmops
Copy link
Member

@Kerollmops Kerollmops commented Jul 21, 2025

This PR removes arroy and puts hannoy instead. The final goal of this PR is to merge Hannoy into Meilisearch and will follow the progress and necessary missing features of Hannoy in this Linear page (internal).

To be done

  • Keep arroy and hannoy and select the right vector store based on the index version
  • Use hannoy when the dumpless upgrade is finished
  • Keep Nate's comment for later
  • Move to a fixed and well-released version of Hannoy (from crates.io)
  • Remove the ugly MEILI_EMBEDDINGS_CHUNK_SIZE env var

We should keep Nate's following comment about the M and M0 values we should use and set. We could expose these two M and M0 parameters (or maybe only the former) in the embedded settings.

For M I'd go with like 8,12,16,32 (and M0 is thus 16,24,32,64) and for ef_construction you need it to be bigger than the M you selected. Here's an interactive graph for weaviate's hnsw implementation showing how recall and query speed vary with these params to give you an idea.

Taking M and ef_construction big can really slow down indexing times (slower than arroy), especially on large dimensional datasets.

M=16 is a good default for most datasets. Another parameter you need to consider is ef_search which is used in the reader.nns (i'm gonna turn this into a builder method later). This controls how long you search for before stopping.


There is a running prototype to generate a Docker Image. The name of the Docker image is prototype-arroy-becomes-hannoy-x where x is an incrementing number.

@Kerollmops Kerollmops added this to the v1.17.0 milestone Jul 21, 2025
@Kerollmops Kerollmops added the db change A database was modified label Jul 21, 2025
Copy link

Hello, I'm a bot 🤖

You are receiving this message because you declared that this PR make changes to the Meilisearch database.
Depending on the nature of the change, additional actions might be required on your part. The following sections detail the additional actions depending on the nature of the change, please copy the relevant section in the description of your PR, and make sure to perform the required actions.

Thank you for contributing to Meilisearch ❤️

This PR makes forward-compatible changes

Forward-compatible changes are changes to the database such that databases created in an older version of Meilisearch are still valid in the new version of Meilisearch. They usually represent additive changes, like adding a new optional attribute or setting.

  • Detail the change to the DB format and why they are forward compatible
  • Forward-compatibility: A database created before this PR and using the features touched by this PR was able to be opened by a Meilisearch produced by the code of this PR.

This PR makes breaking changes

Breaking changes are changes to the database such that databases created in an older version of Meilisearch need changes to remain valid in the new version of Meilisearch. This typically happens when the way to store the data changed (change of database, new required key, etc). This can also happen due to breaking changes in the API of an experimental feature. ⚠️ This kind of changes are more difficult to achieve safely, so proceed with caution and test dumpless upgrade right before merging the PR.

  • Detail the changes to the DB format,
    • which are compatible, and why
    • which are not compatible, why, and how they will be fixed up in the upgrade
  • /!\ Ensure all the read operations still work!
    • If the change happened in milli, you may need to check the version of the database before doing any read operation
    • If the change happened in the index-scheduler, make sure the new code can immediately read the old database
    • If the change happened in the meilisearch-auth database, reach out to the team; we don't know yet how to handle these changes
  • Write the code to go from the old database to the new one
    • If the change happened in milli, the upgrade function should be written and called here
    • If the change happened in the index-scheduler, we've never done it yet, but the right place to do it should be here
  • Write an integration test here ensuring you can read the old database, upgrade to the new database, and read the new database as expected

@Kerollmops Kerollmops force-pushed the arroy-becomes-hannoy branch 2 times, most recently from cbf70df to bc1aa5a Compare July 24, 2025 07:58
@Kerollmops Kerollmops force-pushed the arroy-becomes-hannoy branch from 5d22377 to 4f6f59f Compare July 29, 2025 11:23
@Kerollmops Kerollmops changed the title First version with hannoy Use Hannoy instead of arroy Jul 29, 2025
@Kerollmops Kerollmops changed the base branch from main to release-v1.16.0 July 29, 2025 11:24
@Kerollmops Kerollmops force-pushed the arroy-becomes-hannoy branch from 4f6f59f to 5d22377 Compare July 29, 2025 11:24
@Kerollmops Kerollmops changed the base branch from release-v1.16.0 to main July 29, 2025 11:25
@Kerollmops Kerollmops force-pushed the arroy-becomes-hannoy branch 2 times, most recently from 96594b1 to ae97561 Compare July 29, 2025 11:50
@Kerollmops Kerollmops changed the base branch from main to release-v1.16.0 July 29, 2025 11:50
@Kerollmops Kerollmops force-pushed the arroy-becomes-hannoy branch 2 times, most recently from 79c7163 to 296f065 Compare July 29, 2025 12:57
@@ -98,12 +98,12 @@ impl Progress {
}

// TODO: ideally we should expose the progress in a way that let arroy use it directly
pub(crate) fn update_progress_from_arroy(&self, progress: arroy::WriterProgress) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In progress nnethercott/hannoy#16

@@ -66,7 +66,7 @@ where
let mut bbbuffers = Vec::new();
let finished_extraction = AtomicBool::new(false);

let arroy_memory = grenad_parameters.max_memory;
let hannoy_memory = grenad_parameters.max_memory;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @nnethercott, do you use the max memory in Hannoy to build the tree?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not currently, lemme know if you guys are running into issues later though & we can integrate what's in arroy here

@Kerollmops Kerollmops force-pushed the arroy-becomes-hannoy branch from 65286e8 to c1b0a04 Compare August 1, 2025 07:34
@curquiza curquiza removed this from the v1.17.0 milestone Aug 5, 2025
@Kerollmops Kerollmops force-pushed the arroy-becomes-hannoy branch from 417f124 to 8a01b39 Compare August 7, 2025 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
db change A database was modified
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants