Skip to content

Conversation

generall
Copy link
Member

@generall generall commented Jul 1, 2025

Currently if optimization panics, we can stuck in the loop of failures.
This PR should fix (improve) the endless loop problem.

Actual problem is

                permit.set_on_release(move || {
                    // Notify scheduler that resource budget changed
                    permit_callback(false);
                });

Which re-tries optimization unconditionally, even if permit was dropped because of panic.

What this PR does:

  • Creates a flag to interrupt optimizer loop if optimization fails (this is not actually triggered in experiment, as loop exists before process can start, but still good to have as an extra safety)
  • Changes behavior of on_release -> on_manual_release, to prevent triggering of the callback if permit was destroyed on error. Only explicit release for permission change counts now
  • Moves the check for available disk size before creation of the temp segment, prevents heavy segment operation in the loop.

Behavior after PR:

  • Optimizer is still looped after panic, but now it is only 1 oteration in 5 seconds, which I think is Ok.
  • Optimization with failed disk check doesn't create new segments anymore

@generall generall changed the title WIP: add flag to break optimization loop on panic add flag to break optimization loop on panic Jul 2, 2025
@generall generall requested a review from timvisee July 2, 2025 09:00
@generall generall marked this pull request as ready for review July 2, 2025 09:12

This comment was marked as resolved.

@generall generall merged commit c964bee into dev Jul 3, 2025
18 checks passed
@generall generall deleted the fix-looped-optimization-on-panic branch July 3, 2025 15:54
generall added a commit that referenced this pull request Jul 17, 2025
* WIP: add flag to break optimization loop on panic

* notification on manual budget change only

* move check for disk size before creting a temp segment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants