Skip to content

Deadlock when job in serial_group fails #163

@mdelillo

Description

@mdelillo

We have three jobs that are in the same serial group. The second job triggers once the first passes, and the third triggers when the second passes. On our first run, the first job failed. The other two jobs showed that they were pending. When we tried to start the first job again, it got stuck in the pending state. We tried aborting it and creating a new one, but that too got stuck in pending. We were able to get it to start by aborting the other jobs.

jobs:
- name: superman-cf-deploy
  serial: true
  serial_groups:
    - superman-deployment-group
  plan:
  - aggregate:
    - get: deployments-routing
    - get: cf-release-rc
      trigger: true
    - get: bosh-stemcell
...

- name: superman-diego-deploy
  serial: true
  serial_groups:
    - superman-deployment-group
  plan:
  - aggregate:
    - get: bosh-stemcell
    - get: deployments-routing
    - get: cf-release-rc
      passed:
      - superman-cf-deploy
      trigger: true
      params:
        submodules:
          - src/loggregator
    - get: diego-release
      trigger: true
...

- name: superman-routing-deploy
  serial: true
  serial_groups:
    - superman-deployment-group
  plan:
  - aggregate:
    - get: deployments-routing
    - get: cf-routing-release-rc
      trigger: true
    - get: diego-release
      trigger: true
      params:
        submodules: none
      passed:
      - superman-diego-deploy
...

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions