Skip to content

Conversation

timvisee
Copy link
Member

@timvisee timvisee commented Aug 7, 2025

A stack of PRs with the goal of simplifying our payload index internals.

Rather than having a two stage approach - open and load a payload index - this changes to a single stage approach. This now opens and loads payload indices in a single function call.

By using a single stage, we eliminate a bunch of intermediate states we had to keep before. It required explicit handling of this using Options in all payload indices, which is error prone. This now allow us to remove the intermediate variants, simplifying state handling across all our payload indices. It now follows the RAII philosophy.

The full stack of PRs kept all logic in payload indices the same, except that it merged opening and loading into a single function. No changes have been made in how we interpret our data structures nor how we respond to index queries.

Further in this PR stack we have two PRs that remove a lot of lines as a result of this:

Open questions

  • Return Result<Option<Index>> where None is used if files don't exist, or create a special enum?
    I think we can use Result<Option<Index>> now and revisit this once all PRs are merged.

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --all --all-features command?

@timvisee timvisee force-pushed the payload-index-single-stage-load branch 2 times, most recently from 0e739b6 to 8c41c1d Compare August 8, 2025 10:12
@timvisee timvisee force-pushed the payload-index-single-stage-load branch from 8c41c1d to accddb4 Compare August 8, 2025 10:42
@timvisee timvisee force-pushed the payload-index-single-stage-load branch from accddb4 to a871747 Compare August 8, 2025 10:45
@timvisee timvisee changed the title Open and load payload indices in single stage Open and load numeric payload indices in single stage Aug 8, 2025
@timvisee timvisee marked this pull request as ready for review August 12, 2025 11:34
Copilot

This comment was marked as resolved.

Comment on lines +175 to +191
// Load through mutable numeric index structure
let Some(mutable) =
MutableNumericIndex::<T>::open_rocksdb_db_wrapper(db_wrapper.clone(), false)?
else {
// Column family doesn't exist, cannot load
return Ok(None);
};
// TODO(payload-index-remove-load): remove load when single stage open/load is implemented
mutable.load()?;

let InMemoryNumericIndex {
map,
histogram,
points_count,
max_values_per_point,
point_to_values,
} = mutable.into_in_memory_index();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: we now load the index here, right when we open. The load function has effectively moved into here.

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

Missing refactor for MutableNullIndex. Since we have the
`create_if_missing` flag, we no longer have `open_if_exists` function,
ergo, signature for `new_null_index` should not include an `Option`.
Copy link
Contributor

@coszio coszio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to see this get simplified

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
lib/segment/src/index/field_index/index_selector.rs (1)

175-191: Fix: Don’t drop a successfully created lookup index when range index is absent [duplicate of earlier review, still applicable]

If use_lookup is true, but numeric_new yields None, the current early return Ok(None) discards the lookup index. Compose both options and return Ok(None) only when both are absent.

Apply this diff:

                 let lookup = if use_lookup {
                     Some(FieldIndex::IntMapIndex(
                         self.map_new(field, create_if_missing)?,
                     ))
                 } else {
                     None
                 };
-                let range = if use_range {
-                    match self.numeric_new(field, create_if_missing)? {
-                        Some(index) => Some(FieldIndex::IntIndex(index)),
-                        None => return Ok(None),
-                    }
-                } else {
-                    None
-                };
-
-                Some(lookup.into_iter().chain(range).collect())
+                let range = if use_range {
+                    self.numeric_new(field, create_if_missing)?
+                        .map(FieldIndex::IntIndex)
+                } else {
+                    None
+                };
+
+                let mut indices: Vec<FieldIndex> = Vec::new();
+                if let Some(l) = lookup { indices.push(l); }
+                if let Some(r) = range { indices.push(r); }
+                if indices.is_empty() {
+                    return Ok(None);
+                }
+                return Ok(Some(indices));
🧹 Nitpick comments (1)
lib/segment/src/index/field_index/index_selector.rs (1)

234-263: Optional: Factor out common Integer builder composition

Both index construction and builder paths compose optional lookup/range components. Consider extracting a small helper to reduce duplication and keep behavior changes centralized (e.g., a function returning Vec<FieldIndexBuilder> or Option<Vec<FieldIndex>>> based on flags).

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f9a50a9 and 5ab7300.

📒 Files selected for processing (2)
  • lib/segment/src/index/field_index/index_selector.rs (9 hunks)
  • lib/segment/src/index/struct_payload_index.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • lib/segment/src/index/struct_payload_index.rs
🧰 Additional context used
🧬 Code Graph Analysis (1)
lib/segment/src/index/field_index/index_selector.rs (5)
lib/segment/src/index/field_index/numeric_index/mod.rs (4)
  • index (148-164)
  • new_rocksdb (181-195)
  • new_rocksdb (530-544)
  • is_on_disk (472-478)
lib/segment/src/index/field_index/map_index/mod.rs (2)
  • new_rocksdb (112-118)
  • is_on_disk (498-504)
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (2)
  • is_on_disk (436-438)
  • open (162-199)
lib/segment/src/index/field_index/field_index_base.rs (1)
  • is_on_disk (432-446)
lib/segment/src/index/field_index/null_index/mutable_null_index.rs (1)
  • open (48-65)
🔇 Additional comments (5)
lib/segment/src/index/field_index/index_selector.rs (5)

83-85: LGTM: Numeric index paths correctly propagate absence via Option

The early-return with Ok(self.numeric_new(...)? .map(...)) is clean and consistent with the new “open-and-load” semantics. This ensures we don’t synthesize indices when on-disk state is absent (or creation is disallowed).

Also applies to: 101-103, 111-113, 194-197, 206-209


359-376: LGTM: Numeric index constructor now returns Option and wires create_if_missing

The generic numeric_new correctly matches the backend constructors and propagates Option to reflect absent on-disk state. Passing create_if_missing through for RocksDB and Gridstore is consistent with the new single-stage open+load behavior.

Also applies to: 377-383


394-414: LGTM: numeric_builder now fallible and consistent across backends

Wrapping builder creation in OperationResult is appropriate (especially for RocksDB’s fallible builder). The signature and error propagation look good.


467-474: LGTM: new_null_index always returns a NullIndex

This aligns with MutableNullIndex::open semantics (it returns a handle with empty storage when nothing exists and create_if_missing is false). This change simplifies callers and is consistent with “index handle exists; may or may not be on disk yet.”


210-212: No UUID variant mismatch: UuidMapIndex is the intended index.

The FieldIndexBuilder::UuidIndex variant is actually a MapIndexBuilder<UuidIntType>, and both the builder (map_builder) and the loader (new_index) consistently produce and load FieldIndex::UuidMapIndex. No change is required here.

@timvisee timvisee merged commit e0fc16d into dev Aug 13, 2025
16 checks passed
@timvisee timvisee deleted the payload-index-single-stage-load branch August 13, 2025 08:09
timvisee added a commit that referenced this pull request Aug 14, 2025
* In struct payload index open, use option to specify loaded, set rebuild

* Open and load mutable numeric index in single stage

* Open and load immutable numeric index in single stage

* Open and load mmap numeric index in single stage

* new_null_index always returns null index

Missing refactor for MutableNullIndex. Since we have the
`create_if_missing` flag, we no longer have `open_if_exists` function,
ergo, signature for `new_null_index` should not include an `Option`.

---------

Co-authored-by: Luis Cossío <luis.cossio@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants