Skip to content

Conversation

timvisee
Copy link
Member

@timvisee timvisee commented Feb 20, 2025

Fixes #6025

When loading a local shard, only load segments from valid directory paths.

Before, we'd try to load every entry in the segments directory as if it was a segment. This caused a panic during segment loading if macOS places a .DS_Store file in there.

More specifically, we now ignore:

  • non-directories
  • hidden directories prefixed with a .
  • .DS_Store

We now also walk over all segment path entries before trying to load them. That is significant because we now catch any IO errors before spawning threads. Before we would leak threads if crashing in the middle.

Log example when trying to load such invalid segment:

2025-02-20T18:17:10.241651Z DEBUG collection::shards::local_shard: Segments path entry prefixed with a period, ignoring: ./storage/collections/embeddings_title_ada/0/segments/.DS_Store

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --all --all-features command?

@timvisee timvisee changed the title Fix segment loading, don't try to load unknown files Improve segment loading, don't load invalid segments Feb 20, 2025
@timvisee timvisee marked this pull request as ready for review February 20, 2025 17:57
@timvisee
Copy link
Member Author

The CI error is unrelated and fixed separately in: #6033

Don't load a segment from a path that is not a directory. Ignore segment
file names prefixed with a period as well.
@timvisee timvisee force-pushed the fix-segment-loading-dsstore branch from 66c5d34 to 58c69c3 Compare February 21, 2025 09:31
@timvisee timvisee merged commit d3cc688 into dev Feb 21, 2025
17 checks passed
@timvisee timvisee deleted the fix-segment-loading-dsstore branch February 21, 2025 09:52
@itinance
Copy link

When will this be released?

@timvisee
Copy link
Member Author

When will this be released?

We didn't settle on a day yet. It may be sometime in the next two weeks.

timvisee added a commit that referenced this pull request Mar 21, 2025
* Fix segment loading, don't try to load unknown files

Don't load a segment from a path that is not a directory. Ignore segment
file names prefixed with a period as well.

* Rework segment directory walker, error before spawning threads
@timvisee timvisee mentioned this pull request Mar 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants