Skip to content

Allow rollbacking updates #5523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
May 5, 2025
Merged

Allow rollbacking updates #5523

merged 16 commits into from
May 5, 2025

Conversation

dureuill
Copy link
Contributor

@dureuill dureuill commented Apr 22, 2025

Pull Request

Related issue

Fixes #5529

What does this PR do?

This PR adds the ability to rollback the migration to a newer version.
To use this new ability, one must cancel a processing or failed (or enqueued) DatabaseUpgrade task.

Doing so will also cancel all enqueued, processing or failed DatabaseUpgrade tasks.

The rollback takes place during the processing of the taskCancelation task itself.

The rollback executes the following algorithm:

  1. For each index, attempt to rollback the index
    1. Make sure the index is closed so that there aren't any remaining reference to the index
    2. Reopen the index normally, check if its version is the target version (index was never upgraded in the first place). If it is, move to the next index.
    3. If the version is different from the target version, attempt to open the index in a special mode where its data is the data from before the last write to the index.
    4. If the version of the index in the special mode is the target version, then write something to the index, and commit, discarding the last write to the index.
    5. Otherwise, the rollback cannot be performed. Restart Meilisearch to attempt the upgrade again.
  2. After all the indexes have been rollbacked, the index scheduler itself is downgraded to the target version: note that currently only the version of the index scheduler is rewritten, in case of non-backward compatible changes in the task queue one should incorporate a proper downgrade procedure
  3. Lastly, the VERSION file is rewritten to the target version.

As the index rollback uses the PREV_SNAPSHOT feature from LMDB, it has good properties:

  • rollback should not require free disk space
  • rollback should be extremely fast even if the upgrade was long
  • rollback has not a lot of "points of failure"

Implementation

  • Add an Index::rollback function that attempts to rollback an index to a previous version using the described rollback algorithm
  • Add an IndexMapper::rollback_index function that puts the index in safe conditions to be rollbacked, regarding the cache of indexes.
  • Modify the task batching algorithm (tick function):
    1. Check taskCancelation tasks before databaseUpgrade tasks. When starting a Meilisearch that has an upgrade task and a task cancelation, this allows executing the canceling task before the upgrade task.
    2. Check that the version of the index scheduler is the expected version, refuse to batch any task that isn't a task cancelation or a database upgrade if it isn't the case. This is because a successful downgrade via a taskCancelation should disable the execution of all other tasks (should deletion tasks be allowed as well?)
  • Add cancelling support to upgrade, so that calling the cancel route can abort a databaseUpgrade task that is currently processing. For now the support is rather superficial (one check between each index), it could be improved to add cancel points while an index is upgrading.
  • Modify the processing of taskCancelation tasks to call the rollback routine when at least one database upgrade task is canceled.
  • Move test module out of the giant index.rs

@dureuill dureuill added this to the v1.15.0 milestone Apr 22, 2025
@dureuill dureuill added the db change A database was modified label Apr 22, 2025

This comment was marked as outdated.

@dureuill dureuill marked this pull request as draft April 23, 2025 09:05
@dureuill dureuill added no db change The database didn't change and removed db change A database was modified labels Apr 24, 2025
@dureuill dureuill requested a review from Kerollmops April 24, 2025 14:27
@dureuill dureuill marked this pull request as ready for review April 24, 2025 14:27
Copy link
Member

@Kerollmops Kerollmops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks impressively easy to implement (apart from some little quirks here and there). I still have one question.

To use this new ability, one must cancel a processing or failed (or enqueued) DatabaseUpgrade task.

I don't get how someone could cancel a failed task. Is it because Meilisearch enqueues one upgrade task by index, and if one fails, it is a dedicated task in the task queue that can be rolled back? I am not sure if the task queue has been modified to accept such user requests to cancel failed (already processed) tasks.

if version != Some(package_version) {
return Err(Error::UnrecoverableError(Box::new(
Error::IndexSchedulerVersionMismatch {
index_scheduler_version: version.unwrap_or((1, 12, 0)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it expected to see a raw 1.12.0 version here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ho! It's the default version if we don't have one 🤔

@dureuill
Copy link
Contributor Author

Thank you for the review @Kerollmops

I don't get how someone could cancel a failed task.

This is because of a small targeted modification to the code that handles cancelation. This only applies to databaseUpgrade tasks.

Is it because Meilisearch enqueues one upgrade task by index

No, Meilisearch does not do this. Meilisearch enqueues a single upgrade task to upgrade all indexes.

and if one fails, it is a dedicated task in the task queue that can be rolled back?

No, if the upgrade of a single index fails, the entire upgrade task fails.

I am not sure if the task queue has been modified to accept such user requests to cancel failed (already processed) tasks.

Alright, let's see how this happens, step by step.

  1. When the user requests a cancelation in the /tasks/cancel route, a filter is applied to select applicable tasks. https://github.com/meilisearch/meilisearch/blob/rollback-updates/crates/meilisearch/src/routes/tasks.rs#L384
  2. That filter is the same for task cancelation and task deletion, so it doesn't prevent matching failed tasks (except if the user specified a statuses filter that doesn't include failed)
  3. The matching task uids are stored in the task and ultimately passed to cancel_matched_tasks https://github.com/meilisearch/meilisearch/blob/rollback-updates/crates/index-scheduler/src/scheduler/process_batch.rs#L661
  4. In this function, the matching tasks used to be filtered by the enqueued tasks to remove any finished tasks (processing tasks are canceled using a different mechanism), but this PR changes the logic to check if any DatabaseUpgrade task matches the filter https://github.com/meilisearch/meilisearch/blob/rollback-updates/crates/index-scheduler/src/scheduler/process_batch.rs#L677
  5. When it is the case, the matched tasks are specifically modified (https://github.com/meilisearch/meilisearch/blob/rollback-updates/crates/index-scheduler/src/scheduler/process_batch.rs#L680):
    • Add all the upgrade tasks that are either failed or enqueued (processing were already accounted for)

dureuill and others added 3 commits April 29, 2025 16:03
@dureuill dureuill requested a review from Kerollmops April 29, 2025 15:36
Copy link
Member

@Kerollmops Kerollmops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing performances 👍

@Kerollmops Kerollmops added this pull request to the merge queue Apr 30, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Apr 30, 2025
@ManyTheFish ManyTheFish added this pull request to the merge queue May 5, 2025
Merged via the queue into main with commit 71ab11f May 5, 2025
14 checks passed
@ManyTheFish ManyTheFish deleted the rollback-updates branch May 5, 2025 14:40
@meili-bot meili-bot added the v1.15.0 PRs/issues solved in v1.15.0 released on 2025-06-09 label Jun 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no db change The database didn't change v1.15.0 PRs/issues solved in v1.15.0 released on 2025-06-09
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot cancel database upgrades
4 participants