Skip to content

Backfill dagrun mistakenly evaluated as deadlocked #26044

@wolfier

Description

@wolfier

Apache Airflow version

Other Airflow 2 version

What happened

I used a bash operator to run a backfill command. The dagrun was marked as failed and was alerted for a deadlock even though the task instances themselves were still ran normally. This happens occasionally.

[2022-08-23, 10:54:59 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:59 UTC] {dagrun.py:585} ERROR - Deadlock; marking run <DagRun load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental @ 2022-08-15 08:00:00+00:00: backfill__2022-08-15T08:00:00+00:00, externally triggered: False> failed
...
[2022-08-23, 10:55:19 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:19 UTC] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 1 | tasks waiting: 0 | succeeded: 4 | running: 0 | failed: 0 | skipped: 5 | deadlocked: 0 | not ready: 0
[2022-08-23, 10:55:19 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:19 UTC] {local_executor.py:390} INFO - Shutting down LocalExecutor; waiting for running tasks to finish.  Signal again if you don't want to wait.
[2022-08-23, 10:55:19 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:19 UTC] {backfill_job.py:879} INFO - Backfill done. Exiting.

Here is full backfill log.

[2022-08-23, 10:54:00 UTC] {subprocess.py:74} INFO - Running command: ['bash', '-c', 'cd $AIRFLOW_HOME && airflow dags backfill -s "2022-08-15 00:00:00" -e "2022-08-16 00:00:00"  -c \'{"start_val":"1","end_val":"4"}\' --rerun-failed-tasks --reset-dagruns --yes load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental']
...
[2022-08-23, 10:54:21 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:21 UTC] {task_command.py:371} INFO - Running <TaskInstance: load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental.source.extract_withdrawals_venmo_withdrawal_aud_incremental_load backfill__2022-08-15T08:00:00+00:00 [queued]> on host a8870cb5a3e0
[2022-08-23, 10:54:19 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:19 UTC] {local_executor.py:79} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental', 'source.extract_withdrawals_venmo_withdrawal_aud_incremental_load', 'backfill__2022-08-15T08:00:00+00:00', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/withdrawals_venmo_withdrawal_aud_jdbc_to_redshift_incremental_load.py', '--cfg-path', '/tmp/tmp92x61y3k']
[2022-08-23, 10:54:24 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:24 UTC] {backfill_job.py:367} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 8 | succeeded: 0 | running: 1 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 8
[2022-08-23, 10:54:29 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:29 UTC] {backfill_job.py:367} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 8 | succeeded: 0 | running: 1 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 8
[2022-08-23, 10:54:34 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:34 UTC] {backfill_job.py:367} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 8 | succeeded: 0 | running: 1 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 8
[2022-08-23, 10:54:39 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:39 UTC] {backfill_job.py:367} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 8 | succeeded: 0 | running: 1 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 8
[2022-08-23, 10:54:44 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:44 UTC] {backfill_job.py:367} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 8 | succeeded: 0 | running: 1 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 8
[2022-08-23, 10:54:49 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:49 UTC] {backfill_job.py:367} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 8 | succeeded: 0 | running: 1 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 8
[2022-08-23, 10:54:54 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:54 UTC] {backfill_job.py:367} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 8 | succeeded: 0 | running: 1 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 8
[2022-08-23, 10:54:59 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:59 UTC] {dagrun.py:585} ERROR - Deadlock; marking run <DagRun load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental @ 2022-08-15 08:00:00+00:00: backfill__2022-08-15T08:00:00+00:00, externally triggered: False> failed
[2022-08-23, 10:54:59 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:59 UTC] {dagrun.py:609} INFO - DagRun Finished: dag_id=load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental, execution_date=2022-08-15 08:00:00+00:00, run_id=backfill__2022-08-15T08:00:00+00:00, run_start_date=None, run_end_date=2022-08-23 10:54:59.121952+00:00, run_duration=None, state=failed, external_trigger=False, run_type=backfill, data_interval_start=2022-08-15 08:00:00+00:00, data_interval_end=2022-08-16 08:00:00+00:00, dag_hash=None
[2022-08-23, 10:54:59 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:59 UTC] {dagrun.py:795} WARNING - Failed to record duration of <DagRun load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental @ 2022-08-15 08:00:00+00:00: backfill__2022-08-15T08:00:00+00:00, externally triggered: False>: start_date is not set.
[2022-08-23, 10:54:59 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:59 UTC] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 1 | tasks waiting: 8 | succeeded: 1 | running: 0 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 8
[2022-08-23, 10:54:59 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:59 UTC] {base_executor.py:91} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental', 'destination.post_marker_staging_withdrawals_venmo_withdrawal_aud', 'backfill__2022-08-15T08:00:00+00:00', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/withdrawals_venmo_withdrawal_aud_jdbc_to_redshift_incremental_load.py', '--cfg-path', '/tmp/tmpd1nq6xe2']
[2022-08-23, 10:54:59 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:54:59 UTC] {base_executor.py:91} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental', 'destination.post_marker_fdg_pii_fact_aw_venmo_withdrawal_aud', 'backfill__2022-08-15T08:00:00+00:00', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/withdrawals_venmo_withdrawal_aud_jdbc_to_redshift_incremental_load.py', '--cfg-path', '/tmp/tmps6ah6zww']
[2022-08-23, 10:55:04 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:04 UTC] {local_executor.py:79} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental', 'destination.post_marker_fdg_pii_fact_aw_venmo_withdrawal_aud', 'backfill__2022-08-15T08:00:00+00:00', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/withdrawals_venmo_withdrawal_aud_jdbc_to_redshift_incremental_load.py', '--cfg-path', '/tmp/tmps6ah6zww']
[2022-08-23, 10:55:04 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:04 UTC] {local_executor.py:79} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental', 'destination.post_marker_staging_withdrawals_venmo_withdrawal_aud', 'backfill__2022-08-15T08:00:00+00:00', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/withdrawals_venmo_withdrawal_aud_jdbc_to_redshift_incremental_load.py', '--cfg-path', '/tmp/tmpd1nq6xe2']
[2022-08-23, 10:55:04 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:04 UTC] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 1 | tasks waiting: 3 | succeeded: 1 | running: 2 | failed: 0 | skipped: 3 | deadlocked: 0 | not ready: 3
[2022-08-23, 10:55:06 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:06 UTC] {task_command.py:371} INFO - Running <TaskInstance: load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental.destination.post_marker_fdg_pii_fact_aw_venmo_withdrawal_aud backfill__2022-08-15T08:00:00+00:00 [queued]> on host a8870cb5a3e0
[2022-08-23, 10:55:06 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:06 UTC] {task_command.py:371} INFO - Running <TaskInstance: load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental.destination.post_marker_staging_withdrawals_venmo_withdrawal_aud backfill__2022-08-15T08:00:00+00:00 [queued]> on host a8870cb5a3e0
[2022-08-23, 10:55:09 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:09 UTC] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 1 | tasks waiting: 1 | succeeded: 3 | running: 0 | failed: 0 | skipped: 5 | deadlocked: 0 | not ready: 1
[2022-08-23, 10:55:09 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:09 UTC] {base_executor.py:91} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental', 'post_execution.rotate_checkpoint_withdrawals_venmo_withdrawal_aud', 'backfill__2022-08-15T08:00:00+00:00', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/withdrawals_venmo_withdrawal_aud_jdbc_to_redshift_incremental_load.py', '--cfg-path', '/tmp/tmpkve4mv_q']
[2022-08-23, 10:55:14 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:14 UTC] {local_executor.py:79} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental', 'post_execution.rotate_checkpoint_withdrawals_venmo_withdrawal_aud', 'backfill__2022-08-15T08:00:00+00:00', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/withdrawals_venmo_withdrawal_aud_jdbc_to_redshift_incremental_load.py', '--cfg-path', '/tmp/tmpkve4mv_q']
[2022-08-23, 10:55:14 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:14 UTC] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 1 | tasks waiting: 0 | succeeded: 3 | running: 1 | failed: 0 | skipped: 5 | deadlocked: 0 | not ready: 0
[2022-08-23, 10:55:15 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:15 UTC] {task_command.py:371} INFO - Running <TaskInstance: load_withdrawals_venmo_withdrawal_aud_to_redshift_withdrawals_venmo_withdrawal_aud_incremental.post_execution.rotate_checkpoint_withdrawals_venmo_withdrawal_aud backfill__2022-08-15T08:00:00+00:00 [queued]> on host a8870cb5a3e0
[2022-08-23, 10:55:19 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:19 UTC] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 1 | tasks waiting: 0 | succeeded: 4 | running: 0 | failed: 0 | skipped: 5 | deadlocked: 0 | not ready: 0
[2022-08-23, 10:55:19 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:19 UTC] {local_executor.py:390} INFO - Shutting down LocalExecutor; waiting for running tasks to finish.  Signal again if you don't want to wait.
[2022-08-23, 10:55:19 UTC] {subprocess.py:92} INFO - [2022-08-23, 10:55:19 UTC] {backfill_job.py:879} INFO - Backfill done. Exiting.

What you think should happen instead

The DAG is not deadlocked but still somehow was still evaluated as deadlocked.

How to reproduce

No response

Operating System

N/A

Versions of Apache Airflow Providers

No response

Deployment

Astronomer

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:backfillSpecifically for backfill relatedkind:bugThis is a clearly a bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions