Skip to content

tests: Faster search::multi IT tests #5603

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jun 16, 2025

Conversation

martin-g
Copy link
Contributor

Pull Request

Related issue

#4840

What does this PR do?

  • Use shared server + unique indices where possible

@martin-g
Copy link
Contributor Author

@irevoire This one is huge! Please take a look when you have few minutes and let me know if you have any concerns!
One thing that bothers me a bit is that now the snapshots use [uuid] for all index names and it is not very clear which index brings the hit, e.g. see federation_inconsistent_merge_order() test case.

@irevoire
Copy link
Member

irevoire commented Jun 3, 2025

One thing that bothers me a bit is that now the snapshots use [uuid] for all index names and it is not very clear which index brings the hit, e.g. see federation_inconsistent_merge_order() test case.

Hey yeah, you're right, that's really hard to read 😩

In the beginning, you are using a lot of shared indices, and since these ones have a fixed name, I find the test pretty good to read.
From what I see, almost all the tests rely on like five different datasets (and some are already available as shared index). Do you think it would make sense to make all of these datasets into a unique, named and shared index for all test?

And if a test must update the settings or something, then we don't use the shared server?
It's not ideal, but I feel like it would fix the issue for 99% of the tests while keeping the test very easy to read (maybe even more than before since we won't have to repeat all the initialization work)

Use shared server + unique indices where possible

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
@martin-g martin-g force-pushed the faster-search-multi-it-tests branch from ac0b945 to 8fa6e86 Compare June 10, 2025 11:10
martin-g added 2 commits June 10, 2025 14:48
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
It could be used when we want to see the index name in the assertions,
e.g. `movies-[uuid]`

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
@martin-g
Copy link
Contributor Author

@irevoire 1824fbd introduces Index::unique_index_with_prefix(&str)
Using it produces index names like movies-[uuid] in the response JSON. This way the index is both unique and readable.

@irevoire
Copy link
Member

Oh yes that's easy and smart, I love it 🔥

martin-g added 4 commits June 10, 2025 16:58
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
…` counterparts

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
@martin-g martin-g marked this pull request as ready for review June 11, 2025 05:54
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
@martin-g
Copy link
Contributor Author

@irevoire I think this big boy is ready for review!
You may make yourself some🍿 ! 😄

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
@martin-g
Copy link
Contributor Author

martin-g commented Jun 11, 2025

One of the tests fails due to racing:

---- search::multi::search_multiple_indexes_dont_exist stdout ----
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Snapshot Summary ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Snapshot: search_multiple_indexes_dont_exist
Source: crates/meilisearch/tests/search/multi/mod.rs:794
────────────────────────────────────────────────────────────────────────────────
-old snapshot
+new results
────────────┬───────────────────────────────────────────────────────────────────
    0     0 │ {
    1       │-  "message": "Inside `.queries[0]`: Index `test` not found.",
          1 │+  "message": "Inside `.queries[1]`: Index `nested` not found.",
    2     2 │   "code": "index_not_found",
    3     3 │   "type": "invalid_request",
    4     4 │   "link": "https://docs.meilisearch.com/errors#index_not_found"
    5     5 │ }

Both test and nested indices do not exist in the shared server, but sometimes the error is about the first index (test) and sometimes for the second one (nested)...

Update: It looks like the test index is actually existing in the shared server. Using unique index names consistently returns errors with the first index name in the details.

martin-g added 2 commits June 11, 2025 11:01
By using hardcoded there is a chance that the index could exist

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Copy link
Member

@irevoire irevoire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey!

I saw this chunk of code literally dozens of times I think:

    let documents = DOCUMENTS.clone();
    let (value, _) = movies_index.add_documents(documents, None).await;
    movies_index.wait_task(value.uid()).await.succeeded();

    let (value, _) = movies_index
        .update_settings(json!({
          "sortableAttributes": ["title"],
          "filterableAttributes": ["title", "color"],
          "rankingRules": [
            "sort",
            "words",
            "typo",
            "proximity",
            "attribute",
            "exactness"
          ]
        }))
        .await;
    movies_index.wait_task(value.uid()).await.succeeded();

    let batman_index = server.unique_index_with_prefix("batman");

    let documents = SCORE_DOCUMENTS.clone();
    let (value, _) = batman_index.add_documents(documents, None).await;
    batman_index.wait_task(value.uid()).await.succeeded();

    let (value, _) = batman_index
        .update_settings(json!({
          "sortableAttributes": ["title"],
          "filterableAttributes": ["title"],
          "rankingRules": [
            "sort",
            "words",
            "typo",
            "proximity",
            "attribute",
            "exactness"
          ]
        }))
        .await;
    batman_index.wait_task(value.uid()).await.succeeded();

I think it would make sense to create two new shared index for these two datasets 🤔

Since playing with the ranking rules seems very specific to the multisearch and federation I think they could be local to this file though (at the top would be the best)

Otherwise it looks pretty good! I'll take a look at your issue right now

Comment on lines 788 to 789
let index_1 = format!("index_1-{}", Uuid::new_v4());
let index_2 = format!("index_2-{}", Uuid::new_v4());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not using the unique_with_prefix as you suggested?
It seems "safer" to use

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I used server.unique_index_with_prefix("index_1") but then I simplified it to plain strings because server.unique_index() looks like it is creating the index. Using a plain string makes it more obvious that this is just a name.
I don't mind changing it back to server.unique_index_with_prefix() if you think it would be better!

{"indexUid" : "test", "q": "glass"},
{"indexUid": "nested", "q": "pésti", "sort": ["doggos:desc"]},
{"indexUid" : index.uid, "q": "glass"},
{"indexUid": nested_index.uid, "q": "pésti", "sort": ["mother:desc"]},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

Comment on lines 1927 to 1933
let batman_index = server.unique_index_with_prefix("batman");

let documents = SCORE_DOCUMENTS.clone();
let (value, _) = index.add_documents(documents, None).await;
index.wait_task(value.uid()).await.succeeded();
let (value, _) = batman_index.add_documents(documents, None).await;
batman_index.wait_task(value.uid()).await.succeeded();

let (value, _) = index
let (value, _) = batman_index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I see here we're using the default ranking rules and could use the shared index

https://www.meilisearch.com/docs/learn/relevancy/ranking_rules#list-of-built-in-ranking-rules

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Comment on lines 2113 to 2116
let (value, _) = movies_index.add_documents(documents, None).await;
movies_index.wait_task(value.uid()).await.succeeded();

let (value, _) = index
let (value, _) = movies_index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the order of the ranking rules is not the same as the default ones.
Here:

 "rankingRules": [
            "sort",
            "words",
            "typo",
            "proximity",
            "attribute",
            "exactness"
          ]

Defaults:

[
  "words",
  "typo",
  "proximity",
  "attribute",
  "sort",
  "exactness"
]

Comment on lines 2133 to 2137
let documents = SCORE_DOCUMENTS.clone();
let (value, _) = index.add_documents(documents, None).await;
index.wait_task(value.uid()).await.succeeded();
let (value, _) = batman_index.add_documents(documents, None).await;
batman_index.wait_task(value.uid()).await.succeeded();

let (value, _) = index
let (value, _) = batman_index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the order of the ranking rules is not the same as the default ones.
Here:

 "rankingRules": [
            "sort",
            "words",
            "typo",
            "proximity",
            "attribute",
            "exactness"
          ]

Defaults:

[
  "words",
  "typo",
  "proximity",
  "attribute",
  "sort",
  "exactness"
]

@irevoire
Copy link
Member

Update: It looks like the test index is actually existing in the shared server. Using unique index names consistently returns errors with the first index name in the details.

Ah I see you found the bug while I was reviewing your PR.
But oooof that's a big bug we're never supposed to be able to create a non-unique index with the shared index and that could come from any test 🤔
Maybe a quick way to find the issue would be to update the index creation call to crash when it's called with an index name that contains test?

…s the default

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
@martin-g
Copy link
Contributor Author

But oooof that's a big bug we're never supposed to be able to create a non-unique index with the shared index and that could come from any test

I just tried to list all indexes in the shared server and confirm my assumption above with:

let (response, _code) = server.list_indexes(Some(0), Some(1_000_000)).await;
    dbg!(response);

but it returns:

       START             meilisearch::integration search::multi::search_multiple_indexes_dont_exist

running 1 test
[crates/meilisearch/tests/search/multi/mod.rs:792:5] response = Value(
    Object {
        "results": Array [],
        "offset": Number(0),
        "limit": Number(1000000),
        "total": Number(0),
    },
)

The shared server says "I have no indexes"!

And this is expected since every IT test setups its own Actix-Web App!
So my explanation above is not correct!
It should be a racing issue in the backend ...

@martin-g
Copy link
Contributor Author

I was using cargo nextest and this caused the problem with the empty list of indexes!
Using cargo test shows a list with many entries but all of the items are either unique or shared. All looks good!

@martin-g
Copy link
Contributor Author

AFAIS the queries are processed in the order from the request body JSON and there is no racing -

for (query_index, (index_uid, query, federation_options)) in queries
.into_iter()
.map(SearchQueryWithIndex::into_index_query_federation)
.enumerate()
{
debug!(on_index = query_index, parameters = ?query, "Multi-search");
if federation_options.is_some() {
return Err((
MeilisearchHttpError::FederationOptionsInNonFederatedRequest(
query_index,
)
.into(),
query_index,
));
}
let index = index_scheduler
.index(&index_uid)
.map_err(|err| {
let mut err = ResponseError::from(err);
// Patch the HTTP status code to 400 as it defaults to 404 for `index_not_found`, but
// here the resource not found is not part of the URL.
err.code = StatusCode::BAD_REQUEST;
err
})
.with_index(query_index)?;

Mistery!

martin-g added 3 commits June 12, 2025 13:46
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
…anually with Uuid

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
…erver

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
@martin-g
Copy link
Contributor Author

@irevoire Here is the "test" in a shared server -


Noice! :-)

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
@irevoire
Copy link
Member

Noice! :-)

Oof good catch!

That's why I feels like even if we're not creating the indexes we should always just use the method that makes them unique ahah

Copy link
Member

@irevoire irevoire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, you removed so many indexing process with this file 🎉

@irevoire irevoire added maintenance Issue about maintenance (CI, tests, refacto...) no db change The database didn't change labels Jun 16, 2025
@irevoire irevoire added this to the v1.16.0 milestone Jun 16, 2025
@irevoire irevoire added this pull request to the merge queue Jun 16, 2025
Merged via the queue into meilisearch:main with commit aeaac72 Jun 16, 2025
14 of 16 checks passed
@meili-bot meili-bot added the v1.16.0 PRs/issues solved in v1.16.0 released on 2025-08-04 label Aug 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Issue about maintenance (CI, tests, refacto...) no db change The database didn't change v1.16.0 PRs/issues solved in v1.16.0 released on 2025-08-04
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants