Wait for worker matching strategy when scheduling build steps #6635

multimac · 2021-03-04T13:28:05Z

What does this PR accomplish?

Bug Fix | Feature | Documentation

Changes proposed by this PR:

Similar to how the task step would wait for a worker to become available when using the limit-active-tasks strategy, this expands that functionality to all strategies (like the newly introduced limit-active-containers and limit-active-volumes), as well as applying it to all build step types (ie. check/get/put/task). Additionally, this PR adds a waiting-for-worker event to notify the user when a build is stuck waiting for a worker to become available.

Notes to reviewer:

The existing limit-active-tasks strategy was a bit tricky to keep around in its current form, so I had to rewrite it and that has altered its behavior slightly. Before, it would lock and ensure that the number of active tasks never went above the configured limit. After the rewrite, limit-active-tasks now behaves more like limit-active-containers / limit-active-volumes in that it tries to cap the number of active tasks at the configured limit but may go over slightly due to race conditions.

Also I still need to add some additional tests around the WaitForWorker method, but wanted to open this PR to get some initial feedback on the implementation and also the tweaks to limit-active-tasks mentioned above.

Release Note

Previously, if no workers satisfied the container placement strategy for a step (with the exception of task steps when using the limit-active-tasks placement strategy), the step would simply error the build
Now, all steps will wait for a worker to become available
The metric concourse_tasks_waiting was removed and replaced with concourse_steps_waiting{type="task"}

Contributor Checklist

Followed Code of conduct, Contributing Guide & avoided Anti-patterns
Signed all commits
Added tests (Unit and/or Integration)
Updated Documentation
Added release note (Optional)

Reviewer Checklist

Code reviewed
Tests reviewed
Documentation reviewed
Release notes reviewed
PR acceptance performed
New config flags added? Ensure that they are added to the
BOSH and
Helm packaging; otherwise, ignored for
the integration
tests
(for example, if they are Garden configs that are not displayed in the
--help text).

aoldershaw · 2021-03-04T16:21:15Z

Hi @multimac, thanks for the PR! Seems very useful.

I do have some concerns about removing the locking around limit-active-tasks, though - it would introduce a regression to #6206. While it's true that it's more consistent with limit-active-{containers,volumes}, I think it's likely more impactful since, depending on the workload, having a couple more containers around is much less impactful than having a couple more very expensive tasks. If needed, I think we'd be better off just adding locking to limit-active-{containers,volumes} rather than removing it from limit-active-tasks.

One alternative approach to the one taken that would allow for specialized behaviour for task steps (while still allowing for other steps to wait for a worker) is to add a SelectWorker (or WaitForWorker) method to the engine.BuildStepDelegate. engine.TaskStepDelegate could have its own specialized version (as we currently have) that increments/decrements active tasks (under a lock), while engine.BuildStepDelegate can have a more general version (like worker.Pool.WaitForWorker under your implementation. engine.TaskStepDelegate could shell out to engine.BuildStepDelegate when limit-active-tasks is not set, e.g.

func (d *taskDelegate) WaitForWorker(...) (worker.Client, error) {
    if !strategy.ModifiesActiveTasks() {
        return d.BuildStepDelegate.WaitForWorker(...)
    }
    ...
}

What do you think about this approach? More than happy to chat about alternative approaches

multimac · 2021-03-04T23:34:48Z

Hey @aoldershaw, thanks for the feedback.

That's fair. What do you think of this commit, multimac@302fe0d

I was thinking about it a bit more overnight and thought we could make worker.Pool.SelectWorker responsible for calling engine.BuildStepDelegate.SelectedWorker (under a lock). This way, engine.TaskStepDelegate could put the logic to increment the active tasks in that method, and this would saves the waiting logic from needing to be duplicated in a method like engine.BuildStepDelegate.WaitForWorker.

I'm not sure how container and volume counts are tracked, but theoretically we could increment those counts in engine.BuildStepDelegate.SelectedWorker too. This way container and volume counts would be exact as well.

aoldershaw · 2021-03-08T16:41:21Z

@multimac I'm a bit worried about acquiring a global lock for every step - on large deployments, this would result in a lot of lock contention. Frankly, grabbing a global lock for every task also probably results in a fair bit of lock contention, but I would imagine far less.

We can possibly remove the need for the limit-active-tasks lock by leveraging atomicity in the database. Maybe we could select a list of all viable candidates increment the active tasks count for all of them atomically, and after we select a worker, somehow decrement all of the candidates that we did not select.

i.e. something like

UPDATE workers
SET active_tasks = active_tasks + 1
WHERE name IN (...candidates...)
  AND active_tasks < ...max_active_tasks...

and then after a worker is selected:

UPDATE workers
SET active_tasks = active_tasks - 1
WHERE name IN (...candidates...)
  AND name <> ...selected worker...

Then, engine.TaskStepDelegate is just responsible for decrementing the active tasks upon Finished. WDYT?

re: container and volume counts, that's a bit different - they aren't incremented/decremented in the same way. Currently, limit-active-{containers,volumes} takes the current number of containers/volumes from the last heartbeat from the worker. That is, the worker periodically (every 30s or so by default I think) pings the ATC saying "I'm still alive, and I have these containers and volumes" - the number of active containers/volumes is based on this heartbeat. However, #6469 improves upon this for containers by fetching the number of containers on-the-fly from the containers table (when a new container is created, it's first added to the containers table, and when it gets GC'd, it's removed from the containers table). But either way, we don't have a count we need to increment/decrement.

Oh, and side note - I realize this PR changes of the behaviour of limit-active-tasks. Previously, it had two responsibilities:

Choose the worker(s) with the fewest active tasks, and
(optionally) Filter out workers that have too many active tasks

Now, it's only doing 2, which I think makes more sense honestly. However, maybe we should add in a fewest-active-tasks strategy that only does 1 (and wouldn't necessarily need locking/atomicity, since it's less critical that you don't exceed some limit). This would be consistent with limit-active-containers and fewest-build-containers, for instance

multimac · 2021-03-13T14:13:14Z

Yeah, that does sound a bit gross now you mention it. Instead of bumping the active task count for all containers, could we use ContainerPlacementStrategyChainNode.Pick to increase the active task count on the worker that is being checked and check that it isn't then over the configured limit?

I've pushed up a new commit 5f24ab2 tries that out. I had to add a Release method which then decrements the active task count (similar to how the taskStep did that. This makes tracking the active task count purely the responsibility of the "limit-active-tasks" placement strategy.

Regarding the sorting of active containers, I think I removed this with the intent to add it back in but must have forgotten. I've put it back in in the commit mentioned above though.

In that commit I also removed the logic from the fewest-build-containers and volume-locality placement strategies where they would only return a single worker (that which had the fewest build containers or most volumes already present). Instead they now sort the list of candidates and allow later steps to further sort the list or reject candidate workers.

I imagine there could be a scenario currently where if you configured a strategy chain like volume-locality,limit-active-tasks then the volume-locality strategy could return the workers with the most volumes present but all these workers could fail the limit-active-tasks strategy, resulting in an error in placement of the container. I still need to fix all the tests in this PR, but I'll keep this in mind and make sure to add a test for this case.

aoldershaw

I think your approach with Order and Pick makes things a lot nicer! I've left some suggestions

atc/worker/placement.go

atc/worker/pool.go

atc/worker/placement.go

multimac · 2021-03-16T13:23:15Z

Awesome! Thanks for all the feedback. I don't know why my Google-fu didn't stumble over rand.Shuffle before but thanks for the heads up about it. I'm working on another commit which fixes up all the tests and addresses most of the points you brought up (still need to finish up the work).

One quick note, I noticed this comment (by you, coincidentally)...

concourse/atc/worker/pool.go

Lines 114 to 118 in 5f24ab2

    
           if len(compatibleTeamWorkers) != 0 { 
        
           	// XXX(aoldershaw): if there is a team worker that is compatible but is 
        
           	// rejected by the strategy, shouldn't we fallback to general workers? 
        
           	return compatibleTeamWorkers, nil 
        
           }

I think I could restructure the code a bit to check for a valid team worker first and then, if all team-specific workers are rejected by the strategy, fall back to checking the general worker pool.

Do you have any thoughts on this? Would it maybe be out of scope of this PR? I can imagine a scenario where a team is assigned a single worker and when that worker fails the configured strategy then their jobs start failing (or stalling once this PR is merged).

aoldershaw · 2021-03-16T13:53:40Z

I think I could restructure the code a bit to check for a valid team worker first and then, if all team-specific workers are rejected by the strategy, fall back to checking the general worker pool.

I think it'd be best to leave that alone for now, at least for this PR - I don't totally follow the reasoning for the current behaviour, but know it was done deliberately

multimac · 2021-03-20T09:00:42Z

Can use this pipeline to test that the steps wait for a worker to be returned by the configured strategy, instead of erroring out like before. Tested using the following environment variables to set a limit on max active tasks...

CONCOURSE_CONTAINER_PLACEMENT_STRATEGY=limit-active-tasks,volume-locality,fewest-build-containers
CONCOURSE_MAX_ACTIVE_TASKS_PER_WORKER=1

---
jobs:
- name: job
  public: true
  plan:
  - across:
    - var: count
      values: ["1", "2", "3", "4", "5", "6", "7", "8", "9", "0"]
      max_in_flight: all
    do:
    - task: simple-task
      config:
        platform: linux
        image_resource:
          type: registry-image
          source: { repository: busybox }
        run:
          path: sh
          args:
          - -exc
          - |
            echo Waiting 5 seconds...
            sleep 5

            echo Hello World

The "limit-active-tasks" placement strategy requires some dedicated code (specifically, the "ModifiesActiveTasks" method) and special handling to ensure that there is a hard limit on active tasks in Concourse. Because of this, I'm temporarily removing this placement strategy to simplify the refactoring of container placement. I will add this functionality back in a later commit, though it may need to be altered slightly. Signed-off-by: David Symons <david@symons.io>

…ners on This commit takes the logic in "task_delegate.go" around waiting for the configured placement strategy to find a worker to schedule work on and moves it to "pool.go" so that it can be reused across all the different kinds of steps. The new method, "WaitForWorker", will execute the configured placement strategy periodically (currently every 5 seconds) until a worker is found. Signed-off-by: David Symons <david@symons.io>

This adds a new metric for tracking the number of steps currently waiting in "WaitForWorker" for the configured placement strategy to find a worker to schedule containers on. This should be a decent metric to determine when new worker nodes need to be created to handle pending jobs. Signed-off-by: David Symons <david@symons.io>

The tracking of active tasks was removed in 2446d09 to allow for the refactoring of the container placement logic. This commit adds that logic back into "task_step.go". Signed-off-by: David Symons <david@symons.io>

The "limit-active-tasks" placement strategy was temporarily removed during refactoring, and this commit adds it back. NOTE: The "limit-active-tasks" strategy is a bit less precise now, similar to "limit-active-containers" and "limit-active-volumes", as there is a small period of time between checking the active task count for a worker and picking / incrementing the active task count for the worker (if chosen). Should multiple tasks attempt to be scheduled at the same time, they may all pick the same worker and push the active task count for that worker above the configured limit. Signed-off-by: David Symons <david@symons.io>

This adds a new event, "waiting-for-worker", to notify the user when there is a delay in finding an appropriate worker to schedule the step on. Signed-off-by: David Symons <david@symons.io>

This adds a lock back to ensure that the "limit-active-tasks" strategy will not allow the number of active tasks to go beyond the configured limit. Signed-off-by: David Symons <david@symons.io>

…ctive-tasks" This removes the lock in worker.Pool. Instead of using a lock to ensure accurate tracking of active tasks, the "limit-active-tasks" strategy optimistically increases the active task count in "Pick" and, if that bumps the number of active tasks over the configured limit, returns an error there (while also decrementing the task count back). Signed-off-by: David Symons <david@symons.io>

Updates the placement and pool tests to test the new waiting behaviour. Signed-off-by: David Symons <david@symons.io>

Signed-off-by: David Symons <david@symons.io>

aoldershaw

I really like what you've done, but have a few small comments! Thanks for your patience!

atc/worker/placement.go

atc/worker/placement_test.go

atc/metric/periodic.go

atc/worker/pool.go

Signed-off-by: David Symons <david@symons.io>

aoldershaw

Thanks again (again)! Hopefully the review process wasn't too painful, but I really appreciate all the work you put into addressing my concerns in a timely manner. Let's get this merged and see if we can weed out any other issues

aoldershaw · 2021-03-24T17:36:47Z

FYI @multimac the latest concourse/concourse-dev image has these changes in case you're looking to try it out (digest sha256:fb41d72aa824e06ec712ca9f9adfd92a2ec25de99f09d576e1fd2484cc8a4d70 for alpine, sha256:e4070a79dc90196b5463b454f1577a726da1c2a0a42a36b0fa75541330852f21 for ubuntu) - https://ci.concourse-ci.org/teams/main/pipelines/concourse/jobs/build-image/builds/419

multimac · 2021-03-25T07:19:57Z

No worries. Thanks for your help reviewing the changes. And the review process wasn't painful at all, was actually a very good experience. Hopefully I'll be able to do up some more PRs for Concourse in the future.

And awesome! Didn't get a chance to throw them in our cluster at work today, but will hopefully get the chance to tomorrow. Really keen to try out this functionality and hopefully be able to downscale our pool of workers a little bit.

aoldershaw · 2021-03-25T12:56:42Z

Awesome! By the way, no pressure, but if you are interested in submitting more PRs in the future, you may want to add yourself to the contributors team by submitting a PR to concourse/governance (for example, this one).

It doesn't change too much, but grants you access to the PRs pipeline (so you can rerun flaky builds, etc.), and access to the #dev channel on Discord

Signed-off-by: Bohan Chen <bochen@pivotal.io> Co-authored-by: Esteban Foronda <eforonda@vmware.com>

the implementation is pretty much identical to #6635 with a couple exceptions: * Order for LimitActive{Containers,Volumes} now partitions the workers into two groups - those that satisfy the limit and those that don't. Those that don't are given lower priority, which should result in doing less work when Pick'ing a worker (since the workers that don't exceed the limit are attempted first) * I got rid of Name() and NoWorkerFitContainerPlacementStrategyError because they were unused - since Order doesn't filter out workers, that error would never be returned Signed-off-by: Aidan Oldershaw <aoldershaw@pivotal.io>

multimac force-pushed the master branch from 8e4bf10 to 2268294 Compare March 4, 2021 13:36

vito assigned aoldershaw Mar 4, 2021

multimac force-pushed the master branch 2 times, most recently from d5dbc56 to 302fe0d Compare March 4, 2021 23:33

aoldershaw suggested changes Mar 15, 2021

View reviewed changes

multimac requested a review from a team as a code owner March 20, 2021 05:19

multimac added 10 commits March 20, 2021 19:06

web: behaviour: reintroduce active task tracking to "task_step.go"

375f639

The tracking of active tasks was removed in 2446d09 to allow for the refactoring of the container placement logic. This commit adds that logic back into "task_step.go". Signed-off-by: David Symons <david@symons.io>

web: behaviour: add events for waiting for worker

9cb5d3d

This adds a new event, "waiting-for-worker", to notify the user when there is a delay in finding an appropriate worker to schedule the step on. Signed-off-by: David Symons <david@symons.io>

web: behaviour: Add lock back to protect "limit-active-tasks"

43d630c

This adds a lock back to ensure that the "limit-active-tasks" strategy will not allow the number of active tasks to go beyond the configured limit. Signed-off-by: David Symons <david@symons.io>

web: behaviour: Update tests

dfdcd50

Updates the placement and pool tests to test the new waiting behaviour. Signed-off-by: David Symons <david@symons.io>

web: behaviour: Remove test logging statements

96d2390

Signed-off-by: David Symons <david@symons.io>

multimac force-pushed the master branch 2 times, most recently from c01751c to 973c19b Compare March 20, 2021 10:21

aoldershaw added breaking enhancement labels Mar 22, 2021

aoldershaw suggested changes Mar 22, 2021

View reviewed changes

aoldershaw mentioned this pull request Mar 22, 2021

All workers busy when using container placement strategy limit-active-tasks in 7.0.0 #6648

Closed

web: behaviour: Fix steps-waiting / step-wait-duration metrics

23133e0

Signed-off-by: David Symons <david@symons.io>

aoldershaw approved these changes Mar 24, 2021

View reviewed changes

aoldershaw merged commit 9b1818a into concourse:master Mar 24, 2021

aoldershaw mentioned this pull request Mar 26, 2021

Don't enforce step timeout during worker selection #6760

Merged

11 tasks

chenbh pushed a commit to concourse/docs that referenced this pull request Apr 7, 2021

document concourse/concourse/pull/6635

ca89355

Signed-off-by: Bohan Chen <bochen@pivotal.io> Co-authored-by: Esteban Foronda <eforonda@vmware.com>

chenbh mentioned this pull request Apr 7, 2021

document container placement strategy waiting for workers concourse/docs#416

Merged

chenbh added the release/documented Documentation and release notes have been updated. label Apr 7, 2021

taylorsilva mentioned this pull request Apr 8, 2021

Essentials metrics absent #6552

Closed

Uh oh!

Wait for worker matching strategy when scheduling build steps #6635

Wait for worker matching strategy when scheduling build steps #6635

Uh oh!

Conversation

multimac commented Mar 4, 2021 • edited by aoldershaw Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR accomplish?

Changes proposed by this PR:

Notes to reviewer:

Release Note

Contributor Checklist

Reviewer Checklist

Uh oh!

aoldershaw commented Mar 4, 2021

Uh oh!

multimac commented Mar 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aoldershaw commented Mar 8, 2021

Uh oh!

multimac commented Mar 13, 2021

Uh oh!

aoldershaw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

multimac commented Mar 16, 2021

Uh oh!

aoldershaw commented Mar 16, 2021

Uh oh!

multimac commented Mar 20, 2021

Uh oh!

aoldershaw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aoldershaw left a comment

Choose a reason for hiding this comment

Uh oh!

aoldershaw commented Mar 24, 2021

Uh oh!

multimac commented Mar 25, 2021

Uh oh!

aoldershaw commented Mar 25, 2021

Uh oh!

Uh oh!

multimac commented Mar 4, 2021 •

edited by aoldershaw

Loading

multimac commented Mar 4, 2021 •

edited

Loading