-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Overview:
In the event of an application crash, currently running workflow executions are lost. There's a critical need for a reliable method to restart these abruptly interrupted workflows. This feature introduces a robust solution to address this challenge.
Key Mechanism:
The cornerstone of this feature is the utilisation of the workflow instance's Status field. Under normal operations, a workflow instance transitions through various states like "Finished", "Suspended", or "Faulted". However, if an instance is unexpectedly terminated due to an application crash, its Status remains as "Running". This state indicates that the workflow was active at the time of the crash and did not conclude naturally.
Terminology:
To ensure clarity and avoid confusion with existing processes, we introduce the term "Restart" for this feature. This term is distinct from "Resume", which is already used for restarting suspended workflows. "Restart" specifically refers to the process of restarting workflows that were actively running and got interrupted due to an application crash.
Restarting Methods:
This feature encompasses two primary methods for restarting interrupted workflows:
- Alteration-Based Recovery:
- This method allows for the manual initiation of the recovery process.
- It involves altering specific parameters or settings to trigger the restart of the interrupted workflow. For example, additional input and whether to run the workflow synchronously or asynchronously.
- This option provides control and flexibility, particularly useful in scenarios where selective recovery is needed.
- Automatic Recovery During Application Startup:
- This is an automated approach designed to streamline the recovery process.
- Upon application restart, the system automatically scans for workflows with a "Running" status but were actually halted due to the crash.
- These identified workflows are then automatically recovered, ensuring minimal disruption and swift continuation of business processes.
Conclusion:
This feature is a significant step towards enhancing the resilience and reliability of our workflow management. By accurately identifying and efficiently restarting interrupted workflows, we ensure continuity and reduce the impact of unexpected application crashes.
Sub-issues
Metadata
Metadata
Assignees
Labels
Type
Projects
Status