Skip to content

Container jobs do not get distributed correctly #1586

@sharms

Description

@sharms

Bug Report

screen shot 2017-09-12 at 6 07 18 pm
If any single node in the worker cluster reaches 250 jobs, there will be occasional errors with 'max containers reached'. Running 3.4.1.

To test, we added an additional 2 worker nodes (for a total of 7), and it still occurs:

fly workers

name        containers  platform  tags  team  state    version
sanitized  103         linux     none  none  running  1.2
sanitized  123         linux     none  none  running  1.2
sanitized  98          linux     none  none  running  1.2
sanitized  104         linux     none  none  running  1.2
sanitized  115         linux     none  none  running  1.2
sanitized  133         linux     none  none  running  1.2
sanitized  250         linux     none  none  running  1.2

Interestingly, there appear to be many check jobs (truncated, output is thousands), which led to increasing workers to 7 (at 5 almost all 5 nodes hit 250 over time):
fly containers

handle                                worker                                pipeline               job                                 build #  build id  type   name                                      attempt
sanitized  sanitized  none                   none                                none     none      check  none                                      n/a
sanitized  sanitized  none                   none                                none     none      check  none                                      n/a
sanitized  sanitized  none                   none                                none     none      check  none                                      n/a

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions