Skip to content

Conversation

timvisee
Copy link
Member

@timvisee timvisee commented Jan 21, 2025

Our shard key mapping storage for user defined sharding appears to be broken.

We support two types of custom shard keys, a string and a number. The shard key numbers are not persisted properly and are converted into shard key strings on load. This breaks user defined sharding because the strings and numbers are not interchangeable.

This changes the format in which we persist shard keys to fix this very problem. The new format uses better typing, so that numbers and strings can properly be distinguished.

More specifically, we now have two formats:

  • The Old format is the original format from when shard key mappings were implemented
  • The New format is a more robust, properly persisting shard key numbers

Both formats can be stored and loaded from disk. The same shard_key_mapping.json file is used. Currently this PR still prefers to store in the old format, but the new format is chosen if any shard key numbers are used. This ensures backwards compatibility.

In Qdrant 1.14 we change to always prefer the new format. In a later version the old format can be entirely removed if we so desire.

We used a SaveOnDisk<ShardKeyMapping> structure to handle persisting. I've replaced that with a specialized SaveOnDiskShardKeyMappingWrapper structure that acts in the same way, but takes care of loading/persisting in both formats.

Example

For example. If I now create a collection and add shard keys "1" and "2", we still persist in the old format:

{"1":[1],"2":[2]}

If I now add the shard key number 3, it automagically changes into the new format:

[{"key":"1","shard_ids":[1]},{"key":"2","shard_ids":[2]},{"key":3,"shard_ids":[3]}]

Again, this PR supports both formats interchangeably.

Tasks

  • Add test

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --all --all-features command?

@timvisee timvisee merged commit 5f55d3a into dev Jan 21, 2025
17 checks passed
@timvisee timvisee deleted the fix-shard-key-mapping-losing-number branch January 21, 2025 12:39
timvisee added a commit that referenced this pull request Jan 23, 2025
* Move shard key mapping type definition into separate module

* Create save on disk wrapper for shard mapping, persist in robust format

* Clean up shard key mapping wrapper

* Transform conversion functions into from implementations

* Add test, ensure shard key numbers are correctly persisted on restart
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants