-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Request fragments #5596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request fragments #5596
Conversation
9dfe57c
to
1ed4d9d
Compare
759bac4
to
15a7679
Compare
Hello, I'm a bot 🤖You are receiving this message because you declared that this PR make changes to the Meilisearch database. Thank you for contributing to Meilisearch ❤️ This PR makes forward-compatible changesForward-compatible changes are changes to the database such that databases created in an older version of Meilisearch are still valid in the new version of Meilisearch. They usually represent additive changes, like adding a new optional attribute or setting.
This PR makes breaking changesBreaking changes are changes to the database such that databases created in an older version of Meilisearch need changes to remain valid in the new version of Meilisearch. This typically happens when the way to store the data changed (change of database, new required key, etc). This can also happen due to breaking changes in the API of an experimental feature.
|
1fc6890
to
f5c23f8
Compare
|
||
impl<'doc> Extractor<'doc> for RequestFragmentExtractor<'doc> { | ||
type DocumentMetadata = (); | ||
type Input = Value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider renaming Input
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did not rename, but added documentation clarifying that we are talking about an "input to an embedder"
self.embed_chunks(unused_vectors_distribution) | ||
} | ||
|
||
pub fn drain(mut self, unused_vectors_distribution: &C::ErrorMetadata) -> Result<C> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider adding a panicking, albeit it might be a trade-off we don't want to make 🤔
@@ -389,6 +506,9 @@ impl ArroyWrapper { | |||
let reader = reader?; | |||
let mut searcher = reader.nns(limit); | |||
if let Some(filter) = filter { | |||
if reader.item_ids().is_disjoint(filter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kerollmops consider doing this at the arroy level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[x] Catch and return an error when searchFragments
is empty
1c07b17
to
7db5495
Compare
7db5495
to
02b9ea3
Compare
let has_fragments = !runtime.fragments.is_empty(); | ||
|
||
if has_fragments { | ||
regenerate_all_fragments( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we should iterate all the fragments and compare the old and new value rather than bluntly regenerating everything
crates/milli/src/update/index_documents/extract/extract_vector_points.rs
Show resolved
Hide resolved
crates/milli/src/update/index_documents/extract/extract_vector_points.rs
Outdated
Show resolved
Hide resolved
crates/milli/src/update/index_documents/extract/extract_vector_points.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note on dump imports: any update to a document will regenerate all fragments for that document after a dump import, which can have performance considerations
|
||
(_, None, Some(_)) => Err(MeilisearchHttpError::MissingSearchHybrid.into()), | ||
// !q + hybrid => semantic | ||
(_, false, Some(HybridQuery { semantic_ratio: _, embedder }), v) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to have Some(v) here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that doesn't actually work because we'd be missing the true, false, Some(_), None
case, where we have media
but not v
, which is valid and also semantic.
…n of the type that uses it
- fix old_has_fragments - new_is_user_provided is always false when generating fragments, even if no fragment ever matches
.collect(); | ||
|
||
let are_fragments_inconsistent = | ||
indexing_fragments.is_empty() ^ search_fragments.is_empty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indexing_fragments.is_empty() ^ search_fragments.is_empty(); | |
indexing_fragments.is_empty() != search_fragments.is_empty(); |
Usage page: https://www.notion.so/meilisearch/Search-in-images-usage-1c14b06b651f80c1bf9effe56dbeef54
API side
searchRequestFragments
embedder settingindexingRequestFragments
embedder settingmultimodal
experimental featuremedia
search parameterImplementation side
ValueTemplate
toInjectableTemplate
JsonTemplate
: new type that renders all of the strings of a JSON template as liquid templates using a document or a search querysearchRequestFragments
andindexingRequestFragments
in REST embedder, such that their presence modifies the way documents are embeddedEmbedder
:embed_search
acceptsq
andmedia
embed_index
accepts a list of inputs, does not directly return embeddings, but rather an object that encapsulates in-flight requests.extractor_id -> (embedder_id, label, kind)
wherekind = "userProvided" | "fragment" | "documentTemplate"
embedder_id -> extractor_ids
(extractor_id, vector_id) -> document_ids
wherevector_id
is the index in the arroy writersDocumentChangeExtractorDiff
to find out which vectors to add, remove or regenerate depending on aDocumentChange
ExtractorChangeDocumentDiff
to find out which vectors to add, remove or regenerate for a document depending on aSettingsChange