-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Handle Out-Of-Disk gracefully #4165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…alShard operations
for points_batch in generate_points(points_amount): | ||
insert_points(qdrant_host, collection_name, points_batch) | ||
search_point(qdrant_host, collection_name) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure that search works after insertion
) | ||
EXPECTED_ERROR_MESSAGE = "No space left on device" | ||
if resp.status_code != 200: | ||
if resp.status_code == 500 and EXPECTED_ERROR_MESSAGE in resp.text: | ||
requests.put(f"{qdrant_host}/collections/{collection_name}/points?wait=true", json=batch_json) | ||
else: | ||
error_response = resp.json() | ||
print(f"Points insertions failed with response body:\n{error_response}") | ||
exit(-2) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either status_code:200
or status_code:500 + EXPECTED_ERROR_MESSAGE
is expected
Hey @kemkemG0, thanks for the PR! Before proceeding with the merge, I have a couple of follow-up questions regarding the proposed solution:
|
/// - The disk space retrieval fails, detailing the failure reason. | ||
/// - The available space is less than the configured WAL buffer size, specifying both the available and required space. | ||
async fn ensure_sufficient_disk_space(&self) -> CollectionResult<()> { | ||
let disk_free_space_bytes = fs2::available_space(self.path.as_path()).map_err(|err| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calling sync available_space
in async context might be not so good performance-wise
I would take a look at https://github.com/al8n/fs4-rs |
Hi @generall ! I took a look at fs4 and it seems great; it uses rustix and should be able to build with MUSL as well. I'll proceed with switching from fs2 to fs4. : ) While I'm not an expert on file systems or the Also, quoting from a blog by tokio maintainer ryhl:
Your benchmark shows about ps. updated !! As a means to reduce these checks, I thought about changing the Please let me know your thoughts or if there are any other aspects we should consider. |
Do you assume that the main source of the failures if WAL creating new segments? |
IIRC I've also seen OOD failures in RocksDB. But checking for at least WAL capacity probably prevents that from happening too. |
async fn ensure_sufficient_disk_space(&self) -> CollectionResult<()> { | ||
// Offload the synchronous I/O operation to a blocking thread | ||
let path = self.path.clone(); | ||
let disk_free_space_bytes: u64 = | ||
tokio::task::spawn_blocking(move || fs4::available_space(path.as_path())) | ||
.await | ||
.map_err(|e| { | ||
CollectionError::service_error(format!("Failed to join async task: {}", e)) | ||
})? | ||
.map_err(|err| { | ||
CollectionError::service_error(format!( | ||
"Failed to get free space for path: {} due to: {}", | ||
self.path.as_path().display(), | ||
err | ||
)) | ||
})?; | ||
let disk_buffer_bytes = self | ||
.collection_config | ||
.read() | ||
.await | ||
.wal_config | ||
.wal_capacity_mb | ||
* 1024 | ||
* 1024; | ||
|
||
if disk_free_space_bytes < disk_buffer_bytes.try_into().unwrap_or_default() { | ||
return Err(CollectionError::service_error( | ||
"No space left on device: WAL buffer size exceeds available disk space".to_string(), | ||
)); | ||
} | ||
|
||
Ok(()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting the available space may fail. In my opinion, we shouldn't fail but return early with Ok(())
if that happens. That would prevent making Qdrant unusable if some platform or environment doesn't support this call properly.
Here is some benchmarks on network-attached disk (7500 IOPS) I am uploading 500k points in batch of 10 points with the following:
Result across 3 runs with low dispersion in results: dev takes: 27 second to upload which gives almost 15% slowdown with the check. I am afraid that would be too much and we would need to find some better way to implement this |
I've been considering two possible scenarios:
Additionally, I mentioned
This is because in the current my implementation, a |
Looking at the above performance test, maybe we can make it less intensive:
I would probably prefer the first to prevent having a big hit on tail latencies. It could be part of our current flush loop, which might actually make sense because it's likely for writes to happen at that time. I imagine this as a semi-global atomic in which we store the last read amount of free disk space. The current logic in this PR will just read that, versus doing a syscall in all cases. Once we hit the condition we can query on 100% of the update requests to immediately recover once disk space is available. @generall Thoughts? |
One alternative: make this check less frequent by using timeout and number of operations. So for example, check the disk space only if it was more than one second or more than N operations was performed since the last check |
Wal is a Qdrant crate, you could send a PR to it. https://www.github.com/qdrant/wal |
@generall @timvisee
I will try proceeding with the implementation based on this plan :) @generall |
I implemented BenchMark: 125MBps, 1100IOPS
There is some variation, but they appear to be within the same range. |
If the upsertion rate in terms of bytes is greater than the WAL size per flush interval, then it might fall through this check. What are the above numbers? Do the represent the number of seconds for consecutive runs? |
@timvisee
|
Another idea that might be realized by extending this implementation is to vary the frequency of updates based on free disk space. For example, as far as I remember, the limit per request is 30MB, so how about an implementation where if there is more than 50MB(maybe 100MB is safer???) remaining, updates are made only during the flush loop, and if it's below 50MB, updates are made every time? It may sounds heuristic, but the implementation shouldn't be too complex, and the accuracy is better. |
Good question. The request size can be measured to determine this, but that may be over-engineering it. Maybe we can pick N=1000 for now to refresh every 1000 checks. A thousand vectors with dimensionality 1500 are 6 megabytes. I'd suggest implementing that first. We can fine tune this later. |
It sounds nice, updated with I commented almost simultaneously as you, and probably it wasn't visible to you, so I would also like you to check the previous idea, thanks! |
Hey @kemkemG0, we are going to merge it as-is now. Will also apply some minor updates later. I like you idea about changing the check interval based on free disk size, I think we are going to follow it |
* tests: Add test on low disk * Remove redundant assertion * keep container after failure and print latest logs in console * Add fs2 crate as a dependency and ensure sufficient disk space in LocalShard operations * small fix * Update test * small fix * use available_space * Use `fs2` -> `fs4` and offload sync IO with `tokio::task::spawn_blocking` * create DiskUsageWathcer * chore: Remove unnecessary println statement in update_handler.rs * chore: Fix typo in DiskUsageWatcher struct name * chore: Refactor DiskUsageWatcher to improve disk usage tracking and update logic --------- Co-authored-by: tellet-q <elena.dubrovina@qdrant.com> Co-authored-by: generall <andrey@vasnetsov.com>
[](https://renovatebot.com) This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [qdrant/qdrant](https://qdrant.com/) ([source](https://togithub.com/qdrant/qdrant)) | service | patch | `v1.9.2` -> `v1.9.7` | --- ### Release Notes <details> <summary>qdrant/qdrant (qdrant/qdrant)</summary> ### [`v1.9.7`](https://togithub.com/qdrant/qdrant/releases/tag/v1.9.7) [Compare Source](https://togithub.com/qdrant/qdrant/compare/v1.9.6...v1.9.7) ### Change log #### Improvements - [https://github.com/qdrant/qdrant/pull/4517](https://togithub.com/qdrant/qdrant/pull/4517) - Do not allow embedding the web UI in an iframe - [https://github.com/qdrant/qdrant/pull/4556](https://togithub.com/qdrant/qdrant/pull/4556) - Include HNSW configuration in snasphots to fix some edge cases #### Bug fixes - [https://github.com/qdrant/qdrant/pull/4555](https://togithub.com/qdrant/qdrant/pull/4555) - Fix panic on start with sparse index from versions 1.9.3 to 1.9.6 - [https://github.com/qdrant/qdrant/pull/4551](https://togithub.com/qdrant/qdrant/pull/4551) - Fix positive/negative points IDs being excluded when using recommendation search with `lookup_from` ### [`v1.9.6`](https://togithub.com/qdrant/qdrant/releases/tag/v1.9.6) [Compare Source](https://togithub.com/qdrant/qdrant/compare/v1.9.5...v1.9.6) ### Change log #### Bug fixes - [https://github.com/qdrant/qdrant/pull/4472](https://togithub.com/qdrant/qdrant/pull/4472) - fix potential panic on recovery sparse vectors from crash - [https://github.com/qdrant/qdrant/pull/4426](https://togithub.com/qdrant/qdrant/pull/4426) - improve error message on missing payload index - [https://github.com/qdrant/qdrant/pull/4375](https://togithub.com/qdrant/qdrant/pull/4375) - fix in-place updates for sparse index - [https://github.com/qdrant/qdrant/pull/4523](https://togithub.com/qdrant/qdrant/pull/4523) - fix missing payload index issue, introduced in v1.9.5 ### [`v1.9.5`](https://togithub.com/qdrant/qdrant/releases/tag/v1.9.5) [Compare Source](https://togithub.com/qdrant/qdrant/compare/v1.9.4...v1.9.5) ### Change log #### Features - [https://github.com/qdrant/qdrant/pull/4254](https://togithub.com/qdrant/qdrant/pull/4254) - Add pyroscope integration for continuous profiling on demand #### Improvements - [https://github.com/qdrant/qdrant/pull/4309](https://togithub.com/qdrant/qdrant/pull/4309) - Allow to configure default number of shards per node - [https://github.com/qdrant/qdrant/pull/4317](https://togithub.com/qdrant/qdrant/pull/4317) - Allow to overwrite optimizer settings via config - [https://github.com/qdrant/qdrant/pull/4312](https://togithub.com/qdrant/qdrant/pull/4312), [https://github.com/qdrant/qdrant/pull/4369](https://togithub.com/qdrant/qdrant/pull/4369) - Improve vector size estimations, making index thresholds more reliable - [https://github.com/qdrant/qdrant/pull/4428](https://togithub.com/qdrant/qdrant/pull/4428) - Improve default maximum segment size, base it on number of CPUs used for indexing - [https://github.com/qdrant/qdrant/pull/4370](https://togithub.com/qdrant/qdrant/pull/4370) - Use consistent RocksDB settings for both put and remove - [https://github.com/qdrant/qdrant/pull/4376](https://togithub.com/qdrant/qdrant/pull/4376) - Improve ordering of insertions and deletions in RocksDB - [https://github.com/qdrant/qdrant/pull/4371](https://togithub.com/qdrant/qdrant/pull/4371) - Log error if segment flushing failed on drop - [https://github.com/qdrant/qdrant/pull/4352](https://togithub.com/qdrant/qdrant/pull/4352) - Promote REST request processing problems from warning to error - [https://github.com/qdrant/qdrant/pull/4368](https://togithub.com/qdrant/qdrant/pull/4368) - Improve error messages in cases of missing vectors - [https://github.com/qdrant/qdrant/pull/4391](https://togithub.com/qdrant/qdrant/pull/4391) - Improve shard state log message, not strictly related to snapshot recovery - [https://github.com/qdrant/qdrant/pull/4414](https://togithub.com/qdrant/qdrant/pull/4414) - Improve Dockerfile, don't invalidate caches each commit and allow debug settings #### Bug fixes - [https://github.com/qdrant/qdrant/pull/4402](https://togithub.com/qdrant/qdrant/pull/4402) - Fix deadlock caused by concurrent snapshot and optimization - [https://github.com/qdrant/qdrant/pull/4411](https://togithub.com/qdrant/qdrant/pull/4411) - Fix potentially losing vectors on crash by enabling RocksDB WAL - [https://github.com/qdrant/qdrant/pull/4416](https://togithub.com/qdrant/qdrant/pull/4416), [https://github.com/qdrant/qdrant/pull/4440](https://togithub.com/qdrant/qdrant/pull/4440) - Respect `max_segment_size` on data ingestion with optimizers disabled, create segments as needed - [https://github.com/qdrant/qdrant/pull/4442](https://togithub.com/qdrant/qdrant/pull/4442) - Fix potentially having bad HNSW links on multithreaded systems ### [`v1.9.4`](https://togithub.com/qdrant/qdrant/releases/tag/v1.9.4) [Compare Source](https://togithub.com/qdrant/qdrant/compare/v1.9.3...v1.9.4) ### Change log #### Bug fixes - [https://github.com/qdrant/qdrant/pull/4332](https://togithub.com/qdrant/qdrant/pull/4332) - Fix potentially losing a segment when creating a snapshot with ongoing updates - [https://github.com/qdrant/qdrant/pull/4342](https://togithub.com/qdrant/qdrant/pull/4342) - Fix potential panic on start if there is no appendable segment - [https://github.com/qdrant/qdrant/pull/4328](https://togithub.com/qdrant/qdrant/pull/4328) - Prevent panic when searching with huge limit ### [`v1.9.3`](https://togithub.com/qdrant/qdrant/releases/tag/v1.9.3) [Compare Source](https://togithub.com/qdrant/qdrant/compare/v1.9.2...v1.9.3) ### Change log #### Improvements - [https://github.com/qdrant/qdrant/pull/4165](https://togithub.com/qdrant/qdrant/pull/4165) - Handle Out-Of-Disk on insertions gracefully - [https://github.com/qdrant/qdrant/pull/3964](https://togithub.com/qdrant/qdrant/pull/3964) - Faster consensus convergence with batched updates - [https://github.com/qdrant/qdrant/pull/4301](https://togithub.com/qdrant/qdrant/pull/4301) - Deduplicate points by ID for custom sharding #### Bug fixes - [https://github.com/qdrant/qdrant/pull/4307](https://togithub.com/qdrant/qdrant/pull/4307) - Fix overflow panic if scroll limit is usize::MAX - [https://github.com/qdrant/qdrant/pull/4322](https://togithub.com/qdrant/qdrant/pull/4322) - Fix panic with missing sparse vectors after recovery of corrupted storage #### Web UI - [https://github.com/qdrant/qdrant-web-ui/pull/183](https://togithub.com/qdrant/qdrant-web-ui/pull/183) - Notification for miss-configured collections Full change log: https://github.com/qdrant/qdrant-web-ui/releases/tag/v0.1.26 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/bosun-ai/swiftide). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40MjAuMSIsInVwZGF0ZWRJblZlciI6IjM3LjQyMC4xIiwidGFyZ2V0QnJhbmNoIjoibWFzdGVyIiwibGFiZWxzIjpbXX0=--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
fix: #4108
/claim #4108
Description
Set a disk usage threshold, and perform a threshold(disk_buffer) check for Create and Update operations.
I haven't decided what value to set for
disk_buffer
, butdisk_buffer = wal_capacity_mb
might be a good idea, as I searched here (Handle Out-Of-Disk gracefully #4108 (comment))I introduced a new crate (https://github.com/danburkert/fs2-rs) for getting free disk space where the specified path exists.
Also, I updated the test created in tests: Add test on low disk #4105 so that it ensures it can search while out of disk.
Concern
TODO ?
I wasn't sure If I should fix
collection create
,vector create
andsnapshot create
in this PR, but i'm willing to fix them if needed : )All Submissions:
dev
branch. Did you create your branch fromdev
?New Feature Submissions:
cargo +nightly fmt --all
command prior to submission?cargo clippy --all --all-features
command?Changes to Core Features: