Skip to content

Conversation

generall
Copy link
Member

No description provided.

Copy link
Contributor

coderabbitai bot commented Aug 15, 2025

📝 Walkthrough

Walkthrough

  • DEFAULT_BUCKET_BOUNDARIES_MICROS reduced from 16 to 11 entries (removed the 1.0 and 5.0 microsecond buckets).
  • convert_histogram signature changed to remove total_count and its implementation reworked to produce a dense, cumulative histogram that emits a (boundary, cumulative_count) pair for every boundary (no skipping zero-count buckets).
  • OperationDurationsAggregator::get_statistics now calls convert_histogram(&DEFAULT_BUCKET_BOUNDARIES_MICROS, &self.buckets) instead of building cumulative buckets manually.
  • Tests updated to expect dense histogram semantics and the new convert_histogram signature.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch do-not-hide-empty-histogram-values

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (2)
lib/segment/src/common/operation_time_statistics.rs (2)

337-339: Doc no longer matches behavior: convert_histogram now emits dense output.

The function comment states “sparse” and “omit repeated values”, but the new logic emits a pair for every boundary (including zero-count ranges). Please update the docs to prevent confusion.

Apply this doc update:

-/// Convert a fixed-size non-cumulative histogram to a sparse cumulative histogram.
-/// Omit repeated values to reduce the size of the histogram.
+/// Convert a fixed-size non-cumulative histogram to a dense cumulative histogram.
+/// Emit (boundary, cumulative_count) for every boundary, including zero-count ranges.

431-454: Tests expect sparse behavior; update them to match dense output.

With the new dense histogram behavior, these assertions will fail. Update expected values to include every boundary.

Apply the following updates:

     #[test]
     fn test_convert_histogram() {
         // With all zeroes
         assert_eq!(
-            convert_histogram(&[0., 1., 2., 3., 4., 5.], &[0, 0, 0, 0, 0, 0], 0),
-            vec![],
+            convert_histogram(&[0., 1., 2., 3., 4., 5.], &[0, 0, 0, 0, 0, 0], 0),
+            vec![(0., 0), (1., 0), (2., 0), (3., 0), (4., 0), (5., 0)],
         );
 
         // With all zeroes except the total count
         assert_eq!(
-            convert_histogram(&[0., 1., 2., 3., 4., 5.], &[0, 0, 0, 0, 0, 0], 100),
-            vec![(5., 0)],
+            convert_histogram(&[0., 1., 2., 3., 4., 5.], &[0, 0, 0, 0, 0, 0], 100),
+            vec![(0., 0), (1., 0), (2., 0), (3., 0), (4., 0), (5., 0)],
         );
 
         // Full
         assert_eq!(
             convert_histogram(&[0., 1., 2., 3.], &[1, 20, 300, 4000], 5000),
             vec![(0., 1), (1., 21), (2., 321), (3., 4321)],
         );
 
         // Sparse
         assert_eq!(
-            convert_histogram(&[0., 1., 2., 3., 4., 5., 6.], &[0, 0, 1, 0, 0, 1, 0], 1),
-            vec![(1.0, 0), (2.0, 1), (4.0, 1), (5.0, 2), (6.0, 2)],
+            convert_histogram(&[0., 1., 2., 3., 4., 5., 6.], &[0, 0, 1, 0, 0, 1, 0], 1),
+            vec![(0.0, 0), (1.0, 0), (2.0, 1), (3.0, 1), (4.0, 1), (5.0, 2), (6.0, 2)],
         );
     }
🧹 Nitpick comments (3)
lib/segment/src/common/operation_time_statistics.rs (3)

344-366: Clean up dead code and fix capacity estimation for dense histograms.

  • prev is never set to Some, so both if let Some(prev) branches are dead.
  • The capacity estimate is tuned for sparse output; with dense output it should be le_boundaries.len() to avoid reallocations.

Apply this refactor to simplify and make capacity accurate:

-fn convert_histogram(
-    le_boundaries: &[f32],
-    counts: &[usize],
-    total_count: usize,
-) -> Vec<(f32, usize)> {
-    let rough_len_estimation = std::cmp::min(
-        le_boundaries.len(),
-        counts.iter().filter(|&&c| c != 0).count() * 2,
-    );
-    let mut result = Vec::with_capacity(rough_len_estimation);
-    let mut cumulative_count = 0;
-    let mut prev = None;
-    for (idx, &le) in le_boundaries.iter().enumerate() {
-        let count = counts.get(idx).copied().unwrap_or(0);
-        if let Some(prev) = prev {
-            result.push((prev, cumulative_count));
-        }
-        cumulative_count += count;
-        result.push((le, cumulative_count));
-        prev = None;
-    }
-    if let Some(prev) = prev
-        && cumulative_count != total_count
-    {
-        result.push((prev, cumulative_count));
-    }
-    result
-}
+fn convert_histogram(
+    le_boundaries: &[f32],
+    counts: &[usize],
+    _total_count: usize,
+) -> Vec<(f32, usize)> {
+    // Dense output: one (le, cumulative) pair per boundary.
+    let mut result = Vec::with_capacity(le_boundaries.len());
+    let mut cumulative_count = 0;
+    for (idx, &le) in le_boundaries.iter().enumerate() {
+        let count = counts.get(idx).copied().unwrap_or(0);
+        cumulative_count += count;
+        result.push((le, cumulative_count));
+    }
+    result
+}

368-372: Consider updating merge_histograms docs to drop “sparse”-only phrasing.

The implementation works for dense histograms too. Updating the comment improves clarity and keeps it aligned with convert_histogram.

-/// Merge two sparse cumulative histograms, summing the counts of the same boundaries.
+/// Merge two cumulative histograms (dense or sparse), summing the counts of the same boundaries.
 /// If one boundary is missing in one of the vectors, assume its value to be the same as the next
 /// boundary in the same vector. NOTE: This assumption should be correct when merging histograms
-/// produced by `convert_histogram` with the same set of boundaries, but it's not always the case.
+/// produced by `convert_histogram` with the same set of boundaries, but it's not always the case.

282-295: Drop the unused precomputation in get_statistics to avoid extra work.

The block builds a dense histogram and then discards it in favor of convert_histogram. With the new dense behavior, you can just call convert_histogram directly.

Apply this simplification:

-        let duration_micros_histogram = if detail.histograms {
-            let mut duration_micros_histogram =
-                Vec::with_capacity(DEFAULT_BUCKET_BOUNDARIES_MICROS.len());
-            let mut cumulative_count = 0;
-            for (&count, &le) in self.buckets.iter().zip(&DEFAULT_BUCKET_BOUNDARIES_MICROS) {
-                cumulative_count += count;
-                duration_micros_histogram.push((le, cumulative_count));
-            }
-            convert_histogram(
-                &DEFAULT_BUCKET_BOUNDARIES_MICROS,
-                &self.buckets,
-                self.ok_count,
-            )
-        } else {
-            Vec::new()
-        };
+        let duration_micros_histogram = if detail.histograms {
+            convert_histogram(
+                &DEFAULT_BUCKET_BOUNDARIES_MICROS,
+                &self.buckets,
+                self.ok_count,
+            )
+        } else {
+            Vec::new()
+        };
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 995db94 and 1b90530.

📒 Files selected for processing (1)
  • lib/segment/src/common/operation_time_statistics.rs (2 hunks)
🔇 Additional comments (1)
lib/segment/src/common/operation_time_statistics.rs (1)

77-95: Bucket boundaries reduction verified—no downstream issues found

All references to DEFAULT_BUCKET_BOUNDARIES_MICROS use its .len(), so no code or tests assume 16 buckets. The only residual hard-coded “16” is in the inline capacity:

  • lib/segment/src/common/operation_time_statistics.rs:74
    buckets: SmallVec<[usize; 16]>,

This is harmless but can be tightened to [usize; 14] to match the new default and reduce inline storage.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
lib/segment/src/common/operation_time_statistics.rs (5)

279-287: Remove dead/duplicated cumulative building; use convert_histogram only

The manual loop builds a cumulative histogram and then discards it by returning convert_histogram(...) instead. This wastes CPU and triggers an “unused variable” warning. Remove the redundant code.

-            let mut duration_micros_histogram =
-                Vec::with_capacity(DEFAULT_BUCKET_BOUNDARIES_MICROS.len());
-            let mut cumulative_count = 0;
-            for (&count, &le) in self.buckets.iter().zip(&DEFAULT_BUCKET_BOUNDARIES_MICROS) {
-                cumulative_count += count;
-                duration_micros_histogram.push((le, cumulative_count));
-            }
-            convert_histogram(&DEFAULT_BUCKET_BOUNDARIES_MICROS, &self.buckets)
+            convert_histogram(&DEFAULT_BUCKET_BOUNDARIES_MICROS, &self.buckets)

329-333: Clarify docstring: function now emits a dense cumulative histogram

The doc should explicitly state it emits (upper_boundary, cumulative_count) for every boundary.

-/// Convert a fixed-size non-cumulative histogram to a cumulative histogram.
+/// Convert a fixed-size non-cumulative histogram to a dense cumulative histogram.
+/// Emits (upper_boundary, cumulative_count) for every boundary in `le_boundaries`.

331-351: Simplify convert_histogram; remove unused prev and correct pre-allocation

  • prev is never set to Some(...), so the flush branches never execute.
  • rough_len_estimation can under-allocate (e.g., all zeros), causing unnecessary reallocations. With dense output, we can pre-allocate exactly le_boundaries.len().
 fn convert_histogram(le_boundaries: &[f32], counts: &[usize]) -> Vec<(f32, usize)> {
-    let rough_len_estimation = std::cmp::min(
-        le_boundaries.len(),
-        counts.iter().filter(|&&c| c != 0).count() * 2,
-    );
-    let mut result = Vec::with_capacity(rough_len_estimation);
-    let mut cumulative_count = 0;
-    let mut prev = None;
-    for (idx, &le) in le_boundaries.iter().enumerate() {
-        let count = counts.get(idx).copied().unwrap_or(0);
-        if let Some(prev) = prev {
-            result.push((prev, cumulative_count));
-        }
-        cumulative_count += count;
-        result.push((le, cumulative_count));
-        prev = None;
-    }
-    if let Some(prev) = prev {
-        result.push((prev, cumulative_count));
-    }
-    result
+    let mut result = Vec::with_capacity(le_boundaries.len());
+    let mut cumulative_count = 0usize;
+    for (idx, &le) in le_boundaries.iter().enumerate() {
+        cumulative_count += counts.get(idx).copied().unwrap_or(0);
+        result.push((le, cumulative_count));
+    }
+    result
 }

422-426: Stale comment: “except the total count” no longer applies

There is no total_count parameter anymore, and this block duplicates the previous case. Either update the comment or drop the duplicate.

-        // With all zeroes except the total count
+        // With all zeroes (no total count parameter anymore)

59-73: Public API change: bucket boundaries reduced from 16 → 11
Internal usage always derives bucket count via DEFAULT_BUCKET_BOUNDARIES_MICROS.len() (no hard-coded “16” checks were found). The SmallVec<[usize; 16]> inline capacity in OperationTimeStatistics still ≥ 11, so it continues to work but leaves unused headroom.

• Please confirm no downstream code relies on a 16-element histogram or specific bucket indices.
• Optional: align the SmallVec capacity to 11 or add a comment explaining the extra space.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1b90530 and 08ae531.

📒 Files selected for processing (1)
  • lib/segment/src/common/operation_time_statistics.rs (6 hunks)
🔇 Additional comments (4)
lib/segment/src/common/operation_time_statistics.rs (4)

418-420: LGTM: asserts dense output for the all-zero case

The test now expects entries for every boundary even when counts are zero. Matches the new semantics.


430-431: LGTM: “Full” case matches dense cumulative semantics

The cumulative sums per boundary are correct.


436-446: LGTM: “Sparse” input now yields dense cumulative output

Good coverage ensuring zero-count boundaries are included.


494-496: LGTM: updated convert_histogram calls post-signature change

Consistent with the new signature and the merge test still ensures convert↔merge commutativity.

Copy link
Member

@KShivendu KShivendu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before:

GET /metrics

# TYPE grpc_responses_duration_seconds histogram
grpc_responses_duration_seconds_bucket{endpoint="/qdrant.Points/Upsert",le="0.001"} 0
grpc_responses_duration_seconds_bucket{endpoint="/qdrant.Points/Upsert",le="0.005"} 565
grpc_responses_duration_seconds_bucket{endpoint="/qdrant.Points/Upsert",le="0.01"} 865
grpc_responses_duration_seconds_bucket{endpoint="/qdrant.Points/Upsert",le="0.05"} 1000
grpc_responses_duration_seconds_bucket{endpoint="/qdrant.Points/Upsert",le="+Inf"} 1000

After:

GET /metrics

# TYPE grpc_responses_duration_seconds histogram
grpc_responses_duration_seconds_bucket{endpoint="/qdrant.Points/Upsert",le="0.001"} 0
grpc_responses_duration_seconds_bucket{endpoint="/qdrant.Points/Upsert",le="0.005"} 764
grpc_responses_duration_seconds_bucket{endpoint="/qdrant.Points/Upsert",le="0.01"} 984
grpc_responses_duration_seconds_bucket{endpoint="/qdrant.Points/Upsert",le="0.02"} 1000
grpc_responses_duration_seconds_bucket{endpoint="/qdrant.Points/Upsert",le="0.05"} 1000
grpc_responses_duration_seconds_bucket{endpoint="/qdrant.Points/Upsert",le="0.1"} 1000
grpc_responses_duration_seconds_bucket{endpoint="/qdrant.Points/Upsert",le="0.5"} 1000
grpc_responses_duration_seconds_bucket{endpoint="/qdrant.Points/Upsert",le="1"} 1000
grpc_responses_duration_seconds_bucket{endpoint="/qdrant.Points/Upsert",le="5"} 1000
grpc_responses_duration_seconds_bucket{endpoint="/qdrant.Points/Upsert",le="10"} 1000
grpc_responses_duration_seconds_bucket{endpoint="/qdrant.Points/Upsert",le="50"} 1000
grpc_responses_duration_seconds_bucket{endpoint="/qdrant.Points/Upsert",le="+Inf"} 1000

LGTM. Suggested a small change.

Comment on lines 422 to 426
// With all zeroes except the total count
assert_eq!(
convert_histogram(&[0., 1., 2., 3., 4., 5.], &[0, 0, 0, 0, 0, 0], 100),
vec![(5., 0)],
convert_histogram(&[0., 1., 2., 3., 4., 5.], &[0, 0, 0, 0, 0, 0]),
vec![(0., 0), (1., 0), (2., 0), (3., 0), (4., 0), (5., 0)],
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no difference between this test case and the previous one. Please drop this one.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
lib/segment/src/common/operation_time_statistics.rs (1)

485-490: Duplicate test case: drop or make it meaningfully distinct

The second and fourth case(...) invocations only differ in total_a (256 vs 0) while both inputs are produced by convert_histogram, which yields dense vectors of equal length. In this scenario, merge_histograms never uses total_*, so both cases are equivalent. This was already raised; let's address it.

Apply this diff to drop the duplicate:

         case(&[00, 00, 00, 00, 00], &[86, 50, 47, 84, 52], 256, 319);
         case(&[00, 23, 00, 00, 00], &[00, 00, 00, 84, 00], 30, 90);
-        case(&[00, 00, 00, 00, 00], &[86, 50, 47, 84, 52], 0, 319);

Alternatively, if you want to exercise the total_* fallback path, feed merge_histograms with intentionally sparse cumulative vectors of different lengths (not via convert_histogram).

🧹 Nitpick comments (2)
lib/segment/src/common/operation_time_statistics.rs (2)

279-287: Avoid double work: remove the manual cumulative build and return convert_histogram directly

Inside the histograms branch you precompute a cumulative histogram into a local duration_micros_histogram and then discard it in favor of convert_histogram(...). This extra work is unnecessary.

Apply this diff:

         let duration_micros_histogram = if detail.histograms {
-            let mut duration_micros_histogram =
-                Vec::with_capacity(DEFAULT_BUCKET_BOUNDARIES_MICROS.len());
-            let mut cumulative_count = 0;
-            for (&count, &le) in self.buckets.iter().zip(&DEFAULT_BUCKET_BOUNDARIES_MICROS) {
-                cumulative_count += count;
-                duration_micros_histogram.push((le, cumulative_count));
-            }
             convert_histogram(&DEFAULT_BUCKET_BOUNDARIES_MICROS, &self.buckets)
         } else {
             Vec::new()
         };

329-351: Simplify convert_histogram and fix capacity underestimation

  • prev is never set to Some, so the related branch and the final flush are dead code.
  • rough_len_estimation underestimates capacity when many buckets are zero (e.g., all-zero input reserves 0 and reallocates len times). We already know the exact output length equals le_boundaries.len().

Apply this diff:

-/// Convert a fixed-size non-cumulative histogram to a cumulative histogram.
-fn convert_histogram(le_boundaries: &[f32], counts: &[usize]) -> Vec<(f32, usize)> {
-    let rough_len_estimation = std::cmp::min(
-        le_boundaries.len(),
-        counts.iter().filter(|&&c| c != 0).count() * 2,
-    );
-    let mut result = Vec::with_capacity(rough_len_estimation);
-    let mut cumulative_count = 0;
-    let mut prev = None;
-    for (idx, &le) in le_boundaries.iter().enumerate() {
-        let count = counts.get(idx).copied().unwrap_or(0);
-        if let Some(prev) = prev {
-            result.push((prev, cumulative_count));
-        }
-        cumulative_count += count;
-        result.push((le, cumulative_count));
-        prev = None;
-    }
-    if let Some(prev) = prev {
-        result.push((prev, cumulative_count));
-    }
-    result
-}
+/// Convert a fixed-size non-cumulative histogram to a dense cumulative histogram.
+fn convert_histogram(le_boundaries: &[f32], counts: &[usize]) -> Vec<(f32, usize)> {
+    let mut result = Vec::with_capacity(le_boundaries.len());
+    let mut cumulative_count = 0;
+    for (idx, &le) in le_boundaries.iter().enumerate() {
+        cumulative_count += counts.get(idx).copied().unwrap_or(0);
+        result.push((le, cumulative_count));
+    }
+    result
+}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 08ae531 and f79043d.

📒 Files selected for processing (1)
  • lib/segment/src/common/operation_time_statistics.rs (6 hunks)
🔇 Additional comments (2)
lib/segment/src/common/operation_time_statistics.rs (2)

414-441: Tests validate dense semantics across zero/full/sparse inputs — LGTM

The updated expectations ensure we emit an entry per boundary (including zero-count buckets), matching the new behavior.


59-74: DEFAULT_BUCKET_BOUNDARIES_MICROS length change verified

  • No occurrences of [f32; 16], len() == 16, or SmallVec<[usize; 16]> were found elsewhere in the repo.
  • All internal uses derive length via DEFAULT_BUCKET_BOUNDARIES_MICROS.len(), so the new size (11) propagates dynamically.
  • The SmallVec inline capacity remains at 16 for headroom; you may optionally reduce it to 11 to match the boundary array, but it’s not required.

@generall generall merged commit 4eea351 into dev Aug 18, 2025
16 checks passed
@generall generall deleted the do-not-hide-empty-histogram-values branch August 18, 2025 09:19
timvisee pushed a commit that referenced this pull request Aug 26, 2025
* do not skip empty histogram values + remove too small buckets

* fix tests and remove some more values from the histogram

* remove redundant test
@timvisee timvisee mentioned this pull request Aug 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants