-
-
Notifications
You must be signed in to change notification settings - Fork 867
Description
Summary
We are running a concourse environment with 1 web node and 3 workers using a web docker-compose config, which limits max running task per container to 1. One of the workers is using a tag to limit tasks that can be run on it.
CONCOURSE_CONTAINER_PLACEMENT_STRATEGY: limit-active-tasks
CONCOURSE_MAX_ACTIVE_TASKS_PER_WORKER: 1
We are experiencing an issue that didn't appear before updating to version 7.0.0 - workers can randomly become busy with some unknown tasks (nothing visible on a web UI), and no other job can be started:
All workers are busy at the moment, please stand-by.
This can happen on any worker. In particular, when it does happen on the one that uses a tag, it causes all jobs that require that tag to starve on the message shown above.
We have checked that required workers are indeed online (state: running).
fly -t ci workers
name containers platform tags team state version age
26a773549c67 46 linux none none running 2.3 9d
Jans-Mac-mini.local 0 darwin jan none running 2.3 7d
f0e17b5530aa 39 linux none none running 2.3 10d
We have also checked using fly -t ci builds
that there are no other blocking jobs. Only a single item, which is not a "check" resource is being shown:
1379786 some-project/some-step/114 started 2021-03-08@17:24:57+0100 n/a 7m27s+ main user
Steps to reproduce
Run 1 web node, 1 worker node, use the limit-active-tasks and 1 task per worker. After some time the worker becomes locked, and nothing can be runned on it. Restarting the web node and/or the worker node does not help.
Expected results
When there is no job running on the given worker, it should be possible to run a job on it, using limit-active-tasks and 1 task per worker.
Actual results
Web node says that All workers are busy at the moment, please stand-by.
, and no job can be run on the given worker.
Additional context
We are using garden, because containerd caused issues described in #6613.
Triaging info
- Concourse version: 7.0.0
- Browser (if applicable): N/A
- Did this used to work? Yes, has been working on all previous 6.x.x versions