Skip to content

Conversation

generall
Copy link
Member

@generall generall commented Feb 18, 2025

Optimization is running in multiple stages. One stage is IO-heavy, another is CPU-heavy.
We want to utilize all types of resources at the same time, so we introduce a new type of resource budget, that allows other parallel processes to pick up IO-heavy phase without awaiting for CPU-heavy one

@generall generall marked this pull request as ready for review February 18, 2025 23:54
Comment on lines 459 to 493
// 000 - acquired
// +++ - blocked on waiting
//
// Case: 1 indexation job at a time, long indexing
//
// IO limit = 1
// CPU limit = 2 Next optimization
// │ loop
// │
// ▼
// IO 0 00000000000000 000000000
// CPU 1 00000000000000000
// 2 00000000000000000
//
//
// IO 0 ++++++++++++++00000000000000000
// CPU 1 ++++++++0000000000
// 2 ++++++++0000000000
//
//
// Case: 1 indexing j0b at a time, short indexation
//
//
// IO limit = 1
// CPU limit = 2
//
//
// IO 0 000000000000 ++++++++0000000000
// CPU 1 00000
// 2 00000
//
// IO 0 ++++++++++++00000000000 +++++++
// CPU 1 00000
// 2 00000
// At this stage workload shifts from IO to CPU, so we can release IO permit
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most important part is here

Copy link
Member

@timvisee timvisee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed in call.

We found two points of improvement:

  1. Use the same count in IO budget, and acquire the same amount of IO vs CPU. That keeps both in sync.
  2. replace_with can deadlock when re-acquiring the same resource

@generall generall changed the title IO resource usage permit [WIP] IO resource usage permit Feb 19, 2025
Copy link
Member

@timvisee timvisee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌

@timvisee timvisee merged commit f7d0814 into dev Feb 20, 2025
18 checks passed
@timvisee timvisee deleted the io-usage-permit branch February 20, 2025 08:05
@agourlay
Copy link
Member

Are there some benchmark results to share to validate this patch?

@generall
Copy link
Member Author

Not yet. I wanted to make them before merging

@timvisee
Copy link
Member

Sorry! 🤡

@generall
Copy link
Member Author

Improvement is visible on large-scale indexation. In my experiment I was indexing 400M 512d vectors, and current release take about 40 hours on 8 CPU machine, while dev with this change finishes in 28 hours.

Also attaching CPU/Disk usage proviles, which clearly indicate an improvement:

NEW
Screenshot from 2025-02-22 12-54-02

OLD
Screenshot from 2025-02-22 12-52-43

timvisee added a commit that referenced this pull request Mar 21, 2025
* rename cpu_budget -> resource_budget

* clippy

* add io budget to resources

* fmt

* move budget structures into a separate file

* add extend permit function

* dont extend existing permit

* switch from IO to CPU permit

* do not release resource before aquiring an extension

* fmt

* Review remarks

* Improve resource permit number assertion

* Make resource permit replace_with only acquire extra needed permits

* Remove obsolete drop implementation

* allocate IO budget same as CPU

* review fixes

---------

Co-authored-by: timvisee <tim@visee.me>
@timvisee timvisee mentioned this pull request Mar 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants