-
Notifications
You must be signed in to change notification settings - Fork 1.8k
careful index creation #6403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
careful index creation #6403
Conversation
use crate::index::field_index::FieldIndex; | ||
use crate::types::PayloadFieldSchema; | ||
|
||
pub enum BuildFieldIndexResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This result structure enforces the caller to handle each case, including the IncompatibleSchema
. It makes sure we don't forget to follow the protocol of removing existing index (otherwise it is impossible to handle IncompatibleSchema
case)
BuildFieldIndexResult::IncompatibleSchema => { | ||
// This is a service error, as we should have just removed the old index | ||
// So it should not be possible to get this error | ||
return Err(OperationError::service_error(format!( | ||
"Incompatible schema for field index on field {key}", | ||
))); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This handling only makes sense since we called delete_field_index_if_incompatible
before
📝 WalkthroughWalkthroughThis change introduces new mechanisms for managing field indexes in segments with a focus on schema compatibility. A new enum, Suggested reviewers
Tip ⚡💬 Agentic Chat (Pro Plan, General Availability)
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (12)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🔭 Outside diff range comments (3)
lib/segment/src/types.rs (1)
1858-1883
:⚠️ Potential issueRemove auto-inference of field index type.
This code segment should be removed as it's no longer required with the new explicit schema compatibility approach. The automatic type inference mechanism has been replaced by explicit schema compatibility checks throughout the system.
When inferring types from JSON values, inconsistencies could arise if the index was interrupted during creation, leading to the startup panics described in the PR objectives.
The function
value_type
appears to still be in the codebase but should be removed as part of this PR. Removing this auto-inference mechanism is critical to resolving the startup panic issue described in the PR objectives.-pub fn value_type(value: &Value) -> Option<PayloadSchemaType> { - match value { - Value::Null => None, - Value::Bool(_) => None, - Value::Number(num) => { - if num.is_i64() { - Some(PayloadSchemaType::Integer) - } else if num.is_f64() { - Some(PayloadSchemaType::Float) - } else { - None - } - } - Value::String(_) => Some(PayloadSchemaType::Keyword), - Value::Array(_) => None, - Value::Object(obj) => { - let lon_op = obj.get("lon").and_then(|x| x.as_f64()); - let lat_op = obj.get("lat").and_then(|x| x.as_f64()); - - if let (Some(_), Some(_)) = (lon_op, lat_op) { - return Some(PayloadSchemaType::Geo); - } - None - } - } -}lib/segment/src/index/plain_payload_index.rs (1)
76-83
:⚠️ Potential issue
build_index
never marks the new schema as applied
build_index
always returnsAlreadyBuilt
, which short‑cuts theapply_field_index
call in the upper layers.
ForPlainPayloadIndex
this means the segment’sindexed_fields
map is never updated after
drop_index_if_incompatible
removed an old (incompatible) schema, leaving the segment
without any record of the new (compatible) schema.This breaks consistency guarantees – other segments will report the field as indexed while
plain segments silently ignore it, andSegmentEntry::get_indexed_fields()
will give
conflicting answers.A minimal fix is to return
Built
with an empty vector so the normalapply_field_index
path persists the schema:- ) -> OperationResult<BuildIndexResult> { - Ok(BuildIndexResult::AlreadyBuilt) // No index to build + ) -> OperationResult<BuildIndexResult> { + // no on‑disk index, but we must notify the caller that the schema needs + // to be persisted in `indexed_fields` + Ok(BuildIndexResult::Built(Vec::new())) }lib/collection/src/collection_manager/holders/proxy_segment.rs (1)
1199-1214
:⚠️ Potential issueIndex creation is not recorded → wrapped segment will never learn about the new index
build_field_index
successfully builds the index insidewrite_segment
, but does not add aProxyIndexChange::Create
entry tochanged_indexes
.
Consequentlypropagate_to_wrapped()
(lines 270‑301) will skip the CREATE step and the wrapped segment will continue to run without the index, defeating the whole purpose of building it in the proxy.@@ fn build_field_index(&self, op_num: SeqNumberType, key: PayloadKeyTypeRef, field_type: &PayloadFieldSchema, hw_counter: &HardwareCounterCell) -> OperationResult<BuildFieldIndexResult> { - let field_index = match self + let field_index = match self .payload_index .borrow() .build_index(key, field_type, hw_counter)? { BuildIndexResult::Built(indexes) => indexes, @@ - Ok(BuildFieldIndexResult::Built { - indexes: field_index, - schema: field_type.clone(), - }) + // Record the change so it gets propagated later + self.changed_indexes + .write() + .insert(key.clone(), ProxyIndexChange::Create(field_type.clone(), op_num)); + + Ok(BuildFieldIndexResult::Built { + indexes: field_index, + schema: field_type.clone(), + }) }Without this, index schemas will diverge again after optimisation or restart.
🧹 Nitpick comments (10)
lib/segment/src/index/plain_payload_index.rs (1)
121-129
: Over‑aggressive drop indrop_index_if_incompatible
Always removing the entry even when it is compatible is safe but loses the optimisation of
avoiding an unnecessarysave_config()
.
(You already do this optimisation inStructPayloadIndex
.)- // Just always drop the index, as we don't have any indexes - self.config.indexed_fields.remove(field); - self.save_config() + if let Some(current) = self.config.indexed_fields.get(field) { + if current == _new_payload_schema { + return Ok(()); // schema matches – nothing to do + } + } + self.config.indexed_fields.remove(field); + self.save_config()lib/collection/src/collection_manager/segments_updater.rs (1)
347-358
: Consider propagating the “index was dropped” signal
delete_field_index_if_incompatible
returnsbool
, but the result is ignored.
Forwarding it would let the caller know whether a rebuild was triggered by an actual
schema change, which could be useful for logging or telemetry:- write_segment.with_upgraded(|segment| { - segment.delete_field_index_if_incompatible(op_num, field_name, field_schema) - })?; + let index_dropped = write_segment.with_upgraded(|segment| { + segment.delete_field_index_if_incompatible(op_num, field_name, field_schema) + })?;and later include
index_dropped
in the return value if desired.lib/collection/src/collection_manager/holders/proxy_segment.rs (1)
1255-1261
:get_indexed_fields
removes entries on schema mismatch but never adds updated schemaWhen an existing schema differs, the field is simply removed from the map.
Down‑stream components cannot tell whether the field is un‑indexed or indexed with the new schema.Return the new schema instead:
- if let Some(existing_schema) = indexed_fields.get(field_name) { - if existing_schema != schema { - indexed_fields.remove(field_name); - } - } + match indexed_fields.get_mut(field_name) { + Some(existing) if existing != schema => *existing = schema.clone(), + None => { /* ignore – index was never there */ } + _ => {} + }This keeps the caller’s view consistent with what
delete_field_index_if_incompatible
achieved.lib/segment/src/index/payload_index_base.rs (2)
18-26
: Derive common traits forBuildIndexResult
The enum is passed around (and sometimes compared) but does not implement
Debug
,Clone
, orPartialEq
.
Deriving them eases logging, testing, and pattern matching.-pub enum BuildIndexResult { +#[derive(Debug, Clone, PartialEq)] +pub enum BuildIndexResult {No run‑time impact, just better ergonomics.
59-65
: Provide a default no‑op implementation fordrop_index_if_incompatible
All index types are now forced to implement this even if they never create per‑field indexes (e.g. the simple payload index).
Adding a default implementation keeps the trait backwards‑compatible and reduces boilerplate:fn drop_index_if_incompatible( &mut self, _field: PayloadKeyTypeRef, _new_payload_schema: &PayloadFieldSchema, ) -> OperationResult<()> { Ok(()) }Implementers that need special logic can still override it.
lib/segment/src/segment/entry.rs (2)
21-22
: Two similarly‑named result types are imported – consider an alias to avoid confusion
use crate::data_types::build_index_result::BuildFieldIndexResult;
use crate::index::{BuildIndexResult, …};
Developers reading the file must constantly remember which one is which. A simple alias clarifies intent:
use crate::index::{BuildIndexResult as PayloadBuildResult, PayloadIndex, VectorIndex};and update the references accordingly.
768-787
: Return path onIncompatibleSchema
lacks guidanceThe method bubbles up
BuildFieldIndexResult::IncompatibleSchema
but gives no hint on what the caller should do next.
Consider extending the doc‑comment (or the enum variant) to state that the caller is expected to invoke
delete_field_index_if_incompatible
and retry the build.Small, but improves DX and makes the control‑flow obvious.
lib/segment/src/index/struct_payload_index.rs (3)
41-41
:PayloadContainer
appears to be unused – consider removingThe extra import generates a compiler warning (
unused import: 'PayloadContainer'
) in strict#![deny(warnings)]
builds.- PayloadContainer, PayloadFieldSchema, PayloadKeyType, PayloadKeyTypeRef, VectorNameBuf, + PayloadFieldSchema, PayloadKeyType, PayloadKeyTypeRef, VectorNameBuf,
514-542
:set_indexed
writes configuration twice and contains an unreachable branch
drop_index_if_incompatible
→drop_index
already persistsPayloadConfig
.- The happy‑path (
BuildIndexResult::Built
) callsapply_index
, which saves the config again.
That is two disk writes for one logical operation.- After
drop_index_if_incompatible
, theIncompatibleSchema
variant can no longer occur, so the error branch is effectively dead code.You could streamline the method and avoid redundant I/O:
- self.drop_index_if_incompatible(field, &payload_schema)?; + let schema_was_dropped = + self.drop_index_if_incompatible(field, &payload_schema)?; - let field_index = match self.build_index(field, &payload_schema, hw_counter)? { + let field_index = match self.build_index(field, &payload_schema, hw_counter)? { BuildIndexResult::Built(field_index) => field_index, BuildIndexResult::AlreadyBuilt => { - // Index already built, no need to do anything + if schema_was_dropped { + // Should be impossible – defensive check + return Err(OperationError::service_error( + "index dropped but BuildIndex reported AlreadyBuilt", + )); + } return Ok(()); } BuildIndexResult::IncompatibleSchema => unreachable!(), };Where
drop_index_if_incompatible
returns a bool indicating whether it actually removed anything.
This eliminates double‑flushes and clarifies control‑flow.
558-571
:drop_index_if_incompatible
triggers a full config flush even when unnecessary
self.drop_index(field)?
persists the updated config every time schemas differ.
Immediately afterwards,set_indexed
will rebuild the index and again callapply_index
, causing a secondsave_config()
.Consider:
- Return a flag (
bool
) to indicate that an index was removed, allowing the caller to decide when to persist.- Or move
save_config()
out ofdrop_index
and call it once at a higher level.Reducing superfluous RocksDB flushes will shorten index‑rebuild latency under heavy write load.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
lib/collection/src/collection_manager/holders/proxy_segment.rs
(5 hunks)lib/collection/src/collection_manager/optimizers/segment_optimizer.rs
(3 hunks)lib/collection/src/collection_manager/segments_updater.rs
(2 hunks)lib/segment/src/data_types/build_index_result.rs
(1 hunks)lib/segment/src/data_types/mod.rs
(1 hunks)lib/segment/src/entry/entry_point.rs
(3 hunks)lib/segment/src/index/payload_index_base.rs
(3 hunks)lib/segment/src/index/plain_payload_index.rs
(3 hunks)lib/segment/src/index/struct_payload_index.rs
(4 hunks)lib/segment/src/segment/entry.rs
(2 hunks)lib/segment/src/segment/segment_ops.rs
(1 hunks)lib/segment/src/segment_constructor/segment_builder.rs
(1 hunks)lib/segment/src/types.rs
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
lib/collection/src/collection_manager/optimizers/segment_optimizer.rs (3)
lib/segment/src/segment/entry.rs (1)
version
(35-37)lib/segment/src/entry/entry_point.rs (1)
version
(32-32)lib/collection/src/collection_manager/holders/proxy_segment.rs (2)
version
(368-373)version
(1453-1459)
lib/segment/src/entry/entry_point.rs (3)
lib/segment/src/segment/entry.rs (2)
delete_field_index_if_incompatible
(741-754)build_field_index
(756-787)lib/collection/src/collection_manager/holders/proxy_segment.rs (2)
delete_field_index_if_incompatible
(1178-1197)build_field_index
(1199-1214)lib/segment/src/common/operation_error.rs (1)
service_error
(84-89)
lib/segment/src/index/plain_payload_index.rs (2)
lib/segment/src/index/payload_index_base.rs (3)
set_indexed
(49-54)drop_index
(57-57)drop_index_if_incompatible
(60-64)lib/segment/src/index/struct_payload_index.rs (3)
set_indexed
(514-542)drop_index
(544-556)drop_index_if_incompatible
(558-571)
lib/segment/src/index/payload_index_base.rs (2)
lib/segment/src/index/plain_payload_index.rs (2)
drop_index
(116-119)drop_index_if_incompatible
(121-129)lib/segment/src/index/struct_payload_index.rs (2)
drop_index
(544-556)drop_index_if_incompatible
(558-571)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-consistency
- GitHub Check: test-low-resources
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: integration-tests
- GitHub Check: lint
- GitHub Check: storage-compat-test
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: integration-tests-consensus
🔇 Additional comments (9)
lib/segment/src/data_types/mod.rs (1)
1-1
: Added new module for build index results.This change adds a new module to handle build index results, which is a good way to organize the code and make the new data types available throughout the codebase.
lib/segment/src/segment/segment_ops.rs (1)
27-28
: Removed unused import types.The removal of
PayloadKeyTypeRef
andPayloadSchemaType
from imports aligns with the PR's goal of removing auto-inference of index types. This cleanup helps prevent using the obsolete inference functionality.lib/segment/src/segment_constructor/segment_builder.rs (1)
167-177
: Added key method to handle incompatible schema cases.This new method implements a critical part of the PR's objective - checking for schema compatibility and conditionally removing incompatible field indexes. This helps prevent the inconsistent state that was causing panics when rebuilding indexes.
The implementation correctly:
- Checks if the field exists in the indexed fields
- Compares the schemas for compatibility
- Removes the field if schemas are incompatible
lib/segment/src/data_types/build_index_result.rs (1)
1-16
: Well-designed enum for robust field index building results.This new enum provides a comprehensive way to represent the different outcomes of building a field index, which is central to addressing the index inconsistency issues in the PR.
The enum variants effectively cover all necessary scenarios:
SkippedByVersion
: For handling version-based operation skippingAlreadyExists
: For preventing duplicate workIncompatibleSchema
: For detecting schema mismatches (critical for the bug fix)Built
: For successful outcomes with detailed result informationlib/collection/src/collection_manager/optimizers/segment_optimizer.rs (3)
457-459
: Add support forDeleteIfIncompatible
variant in index changes processing.This change implements handling for the new
ProxyIndexChange::DeleteIfIncompatible
variant, which is used to conditionally delete a field index if it has an incompatible schema. It adds a call toremove_index_field_if_incompatible
on the segment builder.
538-541
: Enhance segment optimization with compatibility-aware deletion logic.This change adds handling for the
DeleteIfIncompatible
operation during segment optimization by callingdelete_field_index_if_incompatible
on the optimized segment. This ensures that existing field indexes with incompatible schemas are properly removed before creating new ones.
724-727
: Support schema compatibility checks during index changes processing.This change integrates the schema compatibility check logic into the segment optimization phase when processing index changes. It ensures that any incompatible field index is properly deleted before a new one is created, preventing potential panics during startup if an incompatible index structure exists.
lib/collection/src/collection_manager/holders/proxy_segment.rs (1)
291-295
: Order of operations may recreate the incompatible index
DeleteIfIncompatible
is processed afterCreate
/Delete
(loop keeps the original order ofHashMap
).
If the optimizer inserted bothDeleteIfIncompatible
and a followingCreate
, the latter would rebuild the index that has just been conditionally dropped.Either:
- Sort
changed_indexes
so thatDeleteIfIncompatible
entries are applied beforeCreate
, or- Perform an explicit second‑pass check for newly incompatible indexes.
Otherwise a corrupted schema can slip through.
lib/segment/src/index/struct_payload_index.rs (1)
34-34
: Import looks goodAdding
BuildIndexResult
to the prelude is required for the refactor – no concerns here.
return if prev_schema == payload_schema { | ||
Ok(BuildIndexResult::AlreadyBuilt) | ||
} else { | ||
Ok(BuildIndexResult::IncompatibleSchema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't handle IncompatibleSchema
inside build_index
function, cause this function only has read-only access to the segment. This is important, since we want to split index building and index registration. Index building is a long process, and we don't want to build it while holding write lock. Therefore we have to process IncompatibleSchema
externally
pub enum BuildIndexResult { | ||
/// Index was built | ||
Built(Vec<FieldIndex>), | ||
/// Index was already built | ||
AlreadyBuilt, | ||
/// Field Index already exists, but incompatible schema | ||
/// Requires extra actions to remove the old index. | ||
IncompatibleSchema, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks too similar to BuildFieldIndexResult
, is it possible to keep just one? or what is the reason for a separate enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar, but the other one have one extra field.
) -> OperationResult<()> { | ||
let payload_schema = payload_schema.into(); | ||
|
||
self.drop_index_if_incompatible(field, &payload_schema)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you manually test that the index is properly created on restart if we crash just after deleting the incompatible index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did
* make sure to delete existing index, if it is not compatible with a new one * fmt * fix tests * review fixes
Current implementation of payload index building had a problem: if payload index already existed and we tried to overwrite it with a new one, and the service was stopped during the process, we could end up in the situation where one index was created in storage, while another index was configured. That caused panic on start:
To prevent this error, this PR changes index creation function in a way, that we always drop incompatible index before trying to build a new one.
Additionally, this PR removes obsolete auto-inference of the index type, which is not used anyway.