Skip to content

limit-active-task: 2 tasks can land on the same worker #6206

@aliculPix4D

Description

@aliculPix4D

Summary

2 (or more) tasks can land on the same worker even though we selected limit-active-task container placement strategy option where max-active-task-per-worker is set to 1.

This bug is probably introduced after this refactoring commit:
11a0216

In my understanding two conditions are needed for bug to occur:
(chooseTaskWorker method in atc/worker/client.go)

  1. chosenWorker = client.pool.FindOrChooseWorkerForContainer for the second task needs to happen before first task increases the number of active tasks on the same worker. In this case choose worker is not nil for the second task.
  2. second task tries to acquire lock only after the first task released it already. Otherwise, we would sleep for 1 second , first task would already increased the number of active tasks and the bug would never occur

Bug occurs very rarely. For example, in the last three months we only experienced it around 15 times in our infra.

bug_concourse

Steps to reproduce

Add:

CONCOURSE_CONTAINER_PLACEMENT_STRATEGY: limit-active-tasks
CONCOURSE_MAX_ACTIVE_TASKS_PER_WORKER: 1

to your docker-compose.yml and run:

docker-compose \                                                 
  -f ./docker-compose.yml \
  -f ./hack/overrides/prometheus.yml \
 up -d

Create a small pipeline template (run_job.yml) i.e.:

---
jobs:
- name: loop
  plan:
  - task: loop
    config:
      platform: linux
      image_resource:
        type: registry-image
        source: 
          repository: busybox
      run:
        path: sh
        args:
          - -c
          - |
            echo "Executing on worker: `hostname`"
            for i in `seq 1 30`; do sleep 1; echo "Slept for ${i} seconds."; done

and executed the following shell script:

#!/bin/bash -ex

for i in `seq 1 60`; do fly -t ci sp -p parallel${i}-second-linux -c run_job.yml -n && fly -t ci up -p parallel${i}-second-linux; done

task(){
   fly -t ci tj -j parallel${i}-second-linux/loop;
}

for i in `seq 1 60`; do 
  task "$i" &
done

Expected results

At all times, only a single task can land on the same worker (if max-active-task-per-worker is set to 1).

Actual results

2 tasks (or more) can land on the same worker

Triaging info

  • Concourse version: 6.5.1
  • Browser (if applicable):
  • Did this used to work? Yes

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions