-
Notifications
You must be signed in to change notification settings - Fork 35
feat: maxsim operator and indexing on maxsim operator #197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1299f7f
to
c236ff5
Compare
1eba71f
to
779d774
Compare
e9d0f0b
to
e4a97ea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements support for the maxsim operator by modifying search and rerank logic, updating tuple serialization (including a version bump and an added field), and adjusting various modules (e.g. prewarm, insert, AM callbacks) to accommodate the new operator. Key changes include:
- Updating functions in crates/algorithm (search, search_and_estimate, rerank) to handle maxsim semantics.
- Bumping the tuple version and adding a new “tuples” field to JumpTuple.
- Adjusting AM and tape operations to use the new IndexPointer and callback conventions.
Reviewed Changes
Copilot reviewed 32 out of 32 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
crates/algorithm/src/search.rs | Modified search functions to remove rerank_in_table returns and add search_and_estimate. |
crates/algorithm/src/tuples.rs | Bumped VERSION to 4 and added a new field (“tuples”) to JumpTupleHeader. |
crates/algorithm/src/prewarm.rs | Changed the tape read invocation from using frozen_first() to appendable_first(). |
src/index/am/mod.rs | Updated pointer conversion callbacks and opfamily extraction for improved clarity. |
Cargo.toml | Added dependency on always_equal module. |
Comments suppressed due to low confidence (2)
crates/algorithm/src/prewarm.rs:92
- Verify that replacing the use of jump_tuple.frozen_first() with jump_tuple.appendable_first() in the prewarm function is intentional, as it could affect which tape is read for prewarming.
jump_tuple.appendable_first(),
crates/algorithm/src/tuples.rs:11
- The tuple version has been bumped from 3 to 4 and the JumpTupleHeader now includes a new 'tuples' field; please ensure that all downstream consumers can handle the new field and consider adding an inline comment explaining its purpose.
const VERSION: u64 = 4;
3e6c030
to
4490ab5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a new maxsim operator and indexing functionality while updating various parts of the codebase to support it. Key changes include:
- Introduction of MaxsimBuilder and related maxsim logic in the index scanners.
- Updates to configuration (GUC) and operator definitions to support new maxsim operators.
- Modifications in CI workflows and packaging processes to accommodate new build targets.
- Updates in algorithm modules (search, rerank, maintain, etc.) to integrate maxsim functionality.
Reviewed Changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
.github/workflows/check.yml | CI configuration updates including runner matrix and artifact packaging |
src/index/scanners/maxsim.rs | New maxsim operator builder implementation |
crates/algorithm/src/search.rs | Enhancements in search functions and introduction of search_and_estimate |
src/datatype/operators_vector.rs & operators_halfvec.rs | Added pg_extern functions for maxsim operator support in vector types |
src/index/opclass.rs | Extended operator class with maxsim variants |
Other crates (algorithm, distance, etc.) | Minor updates to support new maxsim features and configuration gucs |
Comments suppressed due to low confidence (2)
.github/workflows/check.yml:80
- Verify that the conditional expression using '&&' and '||' in the 'runs-on' field always evaluates to a valid runner label across all potential values of matrix.arch.
runs-on: ${{ matrix.arch == 'x86_64' && 'ubuntu-22.04' || 'ubuntu-22.04-arm' }}
crates/algorithm/src/rerank.rs:71
- [nitpick] Consider moving the extraction of vector metadata outside the loop to avoid redundant computation on each iteration, which could improve performance.
let vector = O::Vector::elements_and_metadata(vector.as_borrowed());
e096728
to
fcf95f7
Compare
Signed-off-by: usamoi <usamoi@outlook.com>
Signed-off-by: usamoi <usamoi@outlook.com>
Signed-off-by: usamoi <usamoi@outlook.com>
Signed-off-by: usamoi <usamoi@outlook.com>
Signed-off-by: usamoi <usamoi@outlook.com>
based on 0.2.2 release