-
Notifications
You must be signed in to change notification settings - Fork 1.8k
InferenceUsage in grpc #6601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
InferenceUsage in grpc #6601
Conversation
Co-authored-by: Tim Visée <tim+github@visee.me>
📝 Walkthrough""" WalkthroughThis change set updates the gRPC API and related Rust code to improve and standardize usage reporting, especially for inference usage. It renames the Possibly related PRs
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (9)
🚧 Files skipped from review as they are similar to previous changes (5)
⏰ Context from checks skipped due to timeout of 90000ms (13)
🔇 Additional comments (12)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
lib/api/src/grpc/ops.rs (1)
26-48
: LGTM! Good implementation of emptiness check and conversion.The implementation correctly checks all hardware usage fields and follows the suggested pattern from the past review comment using
then_some
.
🧹 Nitpick comments (3)
docs/grpc/docs.md (1)
182-182
: Fix list indentation in Table of Contents.The new entry uses 4 spaces for the unordered list, but the rest of the document uses 2 spaces. This inconsistency triggers markdownlint (MD007).
Apply this diff to align indentation:
- - [InferenceUsage.ModelsEntry](#qdrant-InferenceUsage-ModelsEntry) + - [InferenceUsage.ModelsEntry](#qdrant-InferenceUsage-ModelsEntry)🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
182-182: Unordered list indentation
Expected: 2; Actual: 4(MD007, ul-indent)
lib/api/src/grpc/qdrant.rs (1)
6199-6200
: Add documentation and serde attribute for the optionalusage
fieldTo improve clarity and ensure we skip serializing
usage
when it'sNone
, consider adding a doc comment and aserde
attribute:#[prost(double, tag = "2")] pub time: f64, +/// Resource usage metrics for this batch update, when available. #[prost(message, optional, tag = "3")] +#[serde(default, skip_serializing_if = "Option::is_none")] pub usage: ::core::option::Option<Usage>,src/tonic/api/points_api.rs (1)
75-92
: Consider the necessity of cloning hw_metrics.The pattern of cloning
hw_metrics
before the async call to capture hardware usage is functional but may be unnecessary. Consider if the hardware usage can be captured after the call to avoid the clone operation.However, the explicit construction of
Usage
with both hardware and inference components is correct and provides clear control over usage reporting.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
docs/grpc/docs.md
(3 hunks)lib/api/src/grpc/ops.rs
(2 hunks)lib/api/src/grpc/proto/points.proto
(2 hunks)lib/api/src/grpc/qdrant.rs
(2 hunks)lib/api/src/rest/conversions.rs
(1 hunks)src/actix/api/update_api.rs
(1 hunks)src/common/inference/query_requests_grpc.rs
(6 hunks)src/common/update.rs
(0 hunks)src/tonic/api/points_api.rs
(5 hunks)src/tonic/api/points_internal_api.rs
(5 hunks)src/tonic/api/query_common.rs
(19 hunks)src/tonic/api/update_common.rs
(14 hunks)
💤 Files with no reviewable changes (1)
- src/common/update.rs
🧰 Additional context used
🧬 Code Graph Analysis (1)
lib/api/src/grpc/ops.rs (1)
lib/api/src/rest/models.rs (5)
is_empty
(63-70)is_empty
(98-100)into_non_empty
(102-104)merge
(106-116)merge_opt
(118-122)
🪛 markdownlint-cli2 (0.17.2)
docs/grpc/docs.md
182-182: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-consistency
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: integration-tests-consensus
- GitHub Check: rust-tests (macos-latest)
🔇 Additional comments (25)
lib/api/src/rest/conversions.rs (1)
28-28
: LGTM! Field rename aligns with protobuf updates.The field name change from
model
tomodels
correctly aligns with the updated gRPC protobuf definitions, ensuring the REST to gRPC conversion remains consistent.src/actix/api/update_api.rs (1)
109-117
: Function call correctly updated to match new signature.The removal of the
inference_token
argument from thedo_delete_points
call aligns with the standardized usage reporting changes mentioned in the PR summary.docs/grpc/docs.md (2)
3179-3188
: Approve field rename inInferenceUsage
.The
model
map field is correctly renamed tomodels
, and the corresponding anchor and heading forInferenceUsage.ModelsEntry
have been updated to match the protobuf changes. The documentation now aligns with the code.
4734-4734
: Approve addition ofusage
field inUpdateBatchResponse
.The optional
usage
field of typeUsage
is correctly documented forUpdateBatchResponse
, completing the usage reporting enhancements across the API.lib/api/src/grpc/qdrant.rs (1)
6586-6586
: Renamemodel
tomodels
inInferenceUsage
is correctThe pluralized field name better reflects the map semantics. The
prost
tag remains1
, so backwards compatibility is preserved. Looks good to me.src/tonic/api/points_internal_api.rs (5)
69-81
: LGTM! Intentional exclusion of inference usage for internal calls.The destructuring pattern correctly extracts the response while intentionally ignoring the inference usage, as documented by the comment. This aligns with the design of keeping inference usage tracking separate for internal vs external API calls.
113-137
: Approve removal of inference token from delete operations.The simplification of the
delete_internal
method signature by removing theinference_token
parameter is consistent with the broader changes across the codebase that remove inference token handling from delete operations.
156-169
: LGTM! Consistent pattern for excluding inference usage.The implementation follows the same pattern as other internal methods, correctly destructuring the response tuple and ignoring the inference usage component for internal API calls.
475-475
: Approve simplified delete handler.The removal of inference token extraction and passing to
delete_internal
is consistent with the updated method signature and broader design changes.
582-582
: Consistent with delete operation simplification.The update to the
Update::Delete
variant correctly removes the inference token parameter, aligning with the changes to thedelete_internal
method signature.src/tonic/api/points_api.rs (2)
152-168
: LGTM! Consistent usage reporting pattern.The implementation follows the same pattern as the
upsert
method, correctly combining hardware and inference usage. The explicit construction provides better visibility into the usage reporting mechanism.
716-716
: Approve hardware-only usage reporting for search matrix operations.The use of
Usage::from_hardware_usage()
is appropriate for operations that don't involve inference processing. The.into_non_empty()
call ensures empty usage data is filtered out appropriately.Also applies to: 742-742
src/common/inference/query_requests_grpc.rs (3)
29-29
: LGTM! Enhanced function signatures for usage tracking.The updated function signatures correctly return tuples containing both the converted query request and inference usage, enabling proper usage tracking in the calling code.
Also applies to: 110-110
62-66
: Approve proper extraction of inference usage.The destructuring of
BatchAccumInferred::from_objects
correctly extracts both the inferred data and usage information. The usage is properly defaulted if absent and converted to the appropriate type.Also applies to: 141-145
155-179
: Well-structured tuple return for query conversion.The return statement correctly packages both the converted
CollectionQueryRequest
and the inference usage, maintaining the same pattern as the groups variant while handling the different request structure appropriately.src/tonic/api/query_common.rs (4)
135-135
: LGTM! Systematic replacement of helper function with explicit usage construction.The consistent replacement of the
usage_or_none
helper function with explicitUsage::from_hardware_usage().into_non_empty()
calls provides better visibility and control over usage reporting. The pattern is applied consistently across all query operations that only involve hardware usage.Also applies to: 183-183, 241-241, 296-296, 349-349, 406-406, 460-460, 506-506, 562-562, 641-641, 699-699, 764-764
787-788
: Approve enhanced query usage tracking.The destructuring of the conversion function to extract both the request and inference usage, followed by the proper construction of
Usage
with both hardware and inference components, correctly implements the enhanced usage tracking for query operations.Also applies to: 820-820
839-847
: Excellent batch usage accumulation pattern.The pre-allocation of
InferenceUsage
with capacity and the subsequent merging of individual query usage is well-optimized. The final construction ofUsage
with accumulated inference usage demonstrates proper resource management and usage tracking for batch operations.Also applies to: 880-885
907-908
: Consistent usage tracking for query groups.The implementation follows the same pattern as single query operations, correctly extracting inference usage from the conversion function and combining it with hardware usage in the response.
Also applies to: 940-940
lib/api/src/grpc/ops.rs (2)
90-130
: Well-designed InferenceUsage implementation with efficient merging.The implementation provides:
- Clear type alias for better readability
- Optimized
with_capacity
constructor for batch operations- Correct merge logic that accumulates token counts for existing models
- Convenient
merge_opt
wrapper for optional values
65-67
:✅ Verification successful
Verify Rust version compatibility for
is_none_or
.The
is_none_or
method was stabilized in Rust 1.82.0. Please ensure the project's minimum supported Rust version (MSRV) is compatible with this feature.
🏁 Script executed:
#!/bin/bash # Description: Check the project's Rust version requirements # Check for rust-toolchain file fd -t f "rust-toolchain" -x cat {} # Check Cargo.toml for rust-version field rg "rust-version" --type toml -A 2 # Check for MSRV in README or documentation rg -i "msrv|minimum.*rust" README.md || trueLength of output: 196
No changes needed:
is_none_or
is supported by Rust 1.87
The project’s Cargo.toml specifiesrust-version = "1.87"
(≥ 1.82), sois_none_or
is fully supported.src/tonic/api/update_common.rs (4)
46-82
: Correct implementation of inference usage propagation in upsert.The function properly captures and returns the inference usage from
do_upsert_points
, with safe handling of the optional usage value.
132-180
: Consistent inference usage handling in update_vectors.The implementation follows the same pattern as
upsert
, properly propagating inference usage from the internal update operation.
806-848
: Good documentation and consistent API design for sync operation.The comment clearly explains that sync operations don't generate actual inference usage, maintaining API consistency. The implementation correctly propagates any usage from point conversion.
432-433
: Excellent implementation of usage accumulation in batch operations.The code efficiently accumulates inference usage across all batch operations with:
- Pre-allocated capacity for performance
- Consistent handling of operations that don't generate inference usage
- Proper usage accumulation and inclusion in the response
Also applies to: 646-661
@@ -1180,7 +1181,7 @@ message Usage { | |||
// --------------------------------------------- | |||
|
|||
message InferenceUsage { | |||
map<string, ModelUsage> model = 1; | |||
map<string, ModelUsage> models = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Field rename improves semantic clarity.
Renaming model
to models
better represents that this is a map containing multiple model usage entries. The change aligns with similar renaming across the codebase.
Verify that all references to this field have been updated consistently:
🏁 Script executed:
#!/bin/bash
# Description: Check for any remaining references to the old 'model' field in InferenceUsage
# Search for potential references to the old field name
rg -A 3 -B 3 "InferenceUsage.*model[^s]" --type rust
rg -A 3 -B 3 "\.model\s*=" --type rust | grep -i inference
Length of output: 1399
Remaining model
references need to be updated to the new models
map
The rename in points.proto
was successful, but the following Rust files still assign to the old model
field and must be updated to use the models
map instead:
- src/common/inference/batch_processing_grpc.rs
doc1.model = "model1".to_string(); doc2.model = "model2".to_string();
- src/common/inference/batch_processing.rs
doc1.model = "model1".to_string(); doc2.model = "model2".to_string();
Please change these to insert into the models
map, for example:
doc1.models.insert("model1".to_string(), /* Usage value */);
doc2.models.insert("model2".to_string(), /* Usage value */);
Ensure all code and tests referring to .model
now reference .models
.
🤖 Prompt for AI Agents
In lib/api/src/grpc/proto/points.proto at line 1184, the field was renamed from
'model' to 'models' to reflect it being a map. You need to update all Rust code
that still assigns to the old 'model' field, specifically in
src/common/inference/batch_processing_grpc.rs and
src/common/inference/batch_processing.rs. Replace assignments like 'doc1.model =
"model1".to_string();' with inserting into the map:
'doc1.models.insert("model1".to_string(), /* Usage value */);'. Make sure all
references to '.model' are changed to '.models' and use map insertion
accordingly.
@@ -989,6 +989,7 @@ message RecommendGroupsResponse { | |||
message UpdateBatchResponse { | |||
repeated UpdateResult result = 1; | |||
double time = 2; // Time spent to process | |||
optional Usage usage = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Good addition for usage reporting consistency.
Adding the optional usage
field to UpdateBatchResponse
aligns with other response messages and supports the standardized usage reporting mentioned in the PR objectives.
Consider documenting this as a breaking change for gRPC clients, though the optional nature minimizes impact:
🌐 Web query:
Are there any compatibility guidelines for gRPC protobuf changes in the Qdrant project documentation?
💡 Result:
Qdrant utilizes gRPC as one of its main interfaces alongside REST, but the documentation doesn't provide specific Qdrant-focused compatibility guidelines for protobuf changes. However, we can understand the compatibility aspects from both Qdrant's implementation and general Protocol Buffers practices.
gRPC in Qdrant
Qdrant supports both REST and gRPC interfaces for client-server communication. The gRPC interface is available on port 6334 by default and follows the same principles as the REST API, with corresponding gRPC methods for each REST endpoint[1]. When deciding between REST and gRPC, Qdrant documentation suggests it's a trade-off between convenience and speed - gRPC is recommended for performance optimization when you're already familiar with Qdrant[1].
Protocol Buffers Compatibility Guidelines
While Qdrant-specific guidelines aren't explicitly detailed, Protocol Buffers (which gRPC uses) maintains important compatibility guarantees:
-
Binary Wire Format Stability: The binary wire format doesn't change even in major version updates. This ensures backward and forward compatibility, allowing newer versions to read old binary wire format data and older binaries to parse newly generated protobuf data[7].
-
Descriptor Schema Changes: In minor or patch releases, new messages, fields, enums, or values may be added, and existing elements may be marked as deprecated. Major releases may remove deprecated elements[7].
-
Language Grammar Changes: Minor or patch releases may introduce new language constructs or alternative syntax, while major releases might remove obsolete features[7].
-
API Compatibility: Minor or patch releases contain additive or source-compatible updates, while major releases may include breaking changes requiring client code updates[7].
Client-Server Version Compatibility in Qdrant
Qdrant does implement a version compatibility check between client and server. The Python client, for instance, verifies compatibility by comparing client and server versions. The check ensures that major versions match and minor version differences don't exceed 1[5]. This check can be skipped by setting check_compatibility=False
in the client initialization.
Qdrant Client Implementation
Qdrant offers official client libraries for multiple programming languages including Python, JavaScript/TypeScript, Rust, Go, .NET, and Java[1]. These clients handle the underlying gRPC communication:
- The Python client provides wrappers around generated gRPC client code[3]
- The Rust client uses gRPC via the Tonic library[4]
- Various code examples show how client implementations interact with the gRPC interface across Java, .NET, and Go[9]
If you need to work with protocol buffer definitions directly, you would need the protoc Protocol Buffers compiler, as mentioned in the Rust client documentation[4].
For languages without official client support, you can use the REST API directly or generate a client using OpenAPI or protobuf definitions[1].
Citations:
- 1: https://qdrant.tech/documentation/interfaces/
- 2: https://github.com/qdrant/qdrant/blob/master/docs/grpc/docs.md
- 3: https://python-client.qdrant.tech/qdrant_client.qdrant_client
- 4: https://github.com/qdrant/rust-client/blob/master/README.md
- 5: https://python-client.qdrant.tech/_modules/qdrant_client/qdrant_remote
- 6: https://dlthub.com/docs/dlt-ecosystem/destinations/qdrant
- 7: https://protobuf.dev/support/version-support/
- 8: https://liambeeton.com/programming/secure-grpc-over-mtls-using-go
- 9: https://qdrant.tech/documentation/guides/distributed_deployment/
- 10: https://connectrpc.com/docs/go/grpc-compatibility/
Document gRPC protobuf changes for client compatibility
Both the new optional Usage usage = 3;
in UpdateBatchResponse
(line 992) and the rename from
map<string, ModelUsage> model = 1;
to
map<string, ModelUsage> models = 1;
(line 1184) are wire-compatible under standard Protocol Buffers rules (added optional field, unchanged tag numbers). However, clients will need to regenerate their gRPC stubs to pick up the new usage
field and the renamed models
property.
Please update your release notes or CHANGELOG to call out:
- Addition of
usage
field toUpdateBatchResponse
(tag 3) - Rename of
model
→models
inInferenceUsage
(tag 1)
so that consumers know to regenerate their client code for full compatibility.
🤖 Prompt for AI Agents
In lib/api/src/grpc/proto/points.proto at line 992, you added an optional field
`usage` to `UpdateBatchResponse`. This is a wire-compatible change but requires
clients to regenerate their gRPC stubs to access the new field. Update the
release notes or CHANGELOG to document the addition of the `usage` field and the
rename of `model` to `models` in `InferenceUsage` at line 1184, informing
consumers to regenerate their client code for full compatibility.
&& *vector_io_write == 0 | ||
} | ||
|
||
pub fn into_non_empty(self) -> Option<Self> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit but I find this name a bit confusing. To me it suggests that values get inserted into the counter.
What about check_not_empty
or some_if_not_empty
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we consume self
it should be called into_*
.
check_not_empty
- would require bool as return type
And into_some_if_not_empty
is just too long 🤷♂️
* InferenceUsage in grpc * Update documentation for InferenceUsage * Update lib/api/src/grpc/ops.rs Co-authored-by: Tim Visée <tim+github@visee.me> * clearer inference usage tracking * remove unused inference token from delete_points * review --------- Co-authored-by: Tim Visée <tim+github@visee.me> Co-authored-by: generall <andrey@vasnetsov.com>
All Submissions:
dev
branch. Did you create your branch fromdev
?New Feature Submissions:
cargo +nightly fmt --all
command prior to submission?cargo clippy --all --all-features
command?Changes to Core Features: