Skip to content

Crashing worker node #9257

@nekrondev

Description

@nekrondev

Summary

Worker node is loop-crashing after upgrade to 7.14.0 due to guardian issues.

Steps to reproduce

Started worker node container via docker-compose file:

services:
  # Concourse CI worker container
  worker:
    image: concourse/concourse:7.14.0
    container_name: worker
    restart: always
    command: worker
    privileged: true
    volumes: ["./keys/worker:/concourse-keys"]
    stop_signal: SIGUSR2
    environment:
      CONCOURSE_TSA_HOST: mytsahost.com:2222
      CONCOURSE_GARDEN_DNS_SERVER: 8.8.8.8
      CONCOURSE_NAME: vm-ciworker
    logging:
      driver: "json-file"
      options:
        max-file: "5"
        max-size: "10m"

Expected results

Worker shouldn't crash on startup using guardian container manager.

Actual results

Issue 1: /var/run/garden-iptables.lock not found (looks like the container doesn't contain /var/run so file creation is failing) and aborting startup.

worker  | {"timestamp":"2025-08-15T08:53:40.236974440Z","level":"error","source":"guardian","message":"guardian.starting-guardian-backend","data":{"error":"bulk starter: setting up default chains: creating lock file for path `/var/run/garden-iptables.lock`: open /var/run/garden-iptables.lock: no such file or directory"}}
worker  | bulk starter: setting up default chains: creating lock file for path `/var/run/garden-iptables.lock`: open /var/run/garden-iptables.lock: no such file or directory
worker  | bulk starter: setting up default chains: creating lock file for path `/var/run/garden-iptables.lock`: open /var/run/garden-iptables.lock: no such file or directory
worker  | {"timestamp":"2025-08-15T08:53:40.248574811Z","level":"error","source":"worker","message":"worker.garden.gdn-runner.logging-runner-exited","data":{"error":"exit status 1","session":"1.2"}}

Issue 2: I fixed issue 1 this by adding "./var/run:/var/run" as a local mount point inside the container. However, guardian now created the lockfile but fails to call iptables because the container uses Busybox that can't run the shellscript executed by guardian.

worker  | {"timestamp":"2025-08-15T08:58:09.260486906Z","level":"info","source":"guardian","message":"guardian.create-global-iptables-chains.create-started","data":{"session":"3"}}
worker  | {"timestamp":"2025-08-15T08:58:09.382024461Z","level":"info","source":"guardian","message":"guardian.start.completed","data":{"session":"6"}}
worker  | {"timestamp":"2025-08-15T08:58:09.382080162Z","level":"error","source":"guardian","message":"guardian.starting-guardian-backend","data":{"error":"bulk starter: setting up default chains: iptables: setup-global-chains: + set -o nounset\n+ set -o errexit\n+ shopt -s nullglob\n+ filter_input_chain=w--input\n+ filter_forward_chain=w--forward\n+ filter_default_chain=w--default\n+ filter_instance_prefix=w--instance-\n+ nat_prerouting_chain=w--prerouting\n+ nat_postrouting_chain=w--postrouting\n+ nat_instance_prefix=w--instance-\n+ iptables_bin=/var/gdn/assets/linux/sbin/iptables\n+ case \"${ACTION}\" in\n+ setup_filter\n+ teardown_filter\n+ teardown_deprecated_rules\n++ /var/gdn/assets/linux/sbin/iptables -w -S INPUT\n+ rules='-P INPUT ACCEPT'\n+ xargs --no-run-if-empty --max-lines=1 /var/gdn/assets/linux/sbin/iptables -w\nxargs: unrecognized option '--max-lines=1'\n+ sed -e 's/--icmp-type any/--icmp-type 255\\/255/'\n+ sed -e s/-A/-D/ -e 's/\\s\\+$//'\n+ grep ' -j garden-dispatch'\nBusyBox v1.37.0 (2025-08-06 14:24:35 UTC)+ echo '-P INPUT ACCEPT'\n multi-call binary.\n\nUsage: xargs [OPTIONS] [PROG ARGS]\n\nRun PROG on every item given by stdin\n\n\t-0\tNUL terminated input\n\t-a FILE\tRead from FILE instead of stdin\n\t-o\tReopen stdin as /dev/tty\n\t-r\tDon't run command if input is empty\n\t-t\tPrint the command on stderr before execution\n\t-p\tAsk user whether to run each command\n\t-E STR,-e[STR]\tSTR stops input processing\n\t-I STR\tReplace STR within PROG ARGS with input line\n\t-n N\tPass no more than N args to PROG\n\t-s N\tPass command line of no more than N bytes\n\t-P N\tRun up to N PROGs in parallel\n\t-x\tExit if size is exceeded\n"}}
worker  | bulk starter: setting up default chains: iptables: setup-global-chains: + set -o nounset
worker  | + set -o errexit
worker  | + shopt -s nullglob
worker  | + filter_input_chain=w--input
worker  | + filter_forward_chain=w--forward
worker  | + filter_default_chain=w--default
worker  | + filter_instance_prefix=w--instance-
worker  | + nat_prerouting_chain=w--prerouting
worker  | + nat_postrouting_chain=w--postrouting
worker  | + nat_instance_prefix=w--instance-
worker  | + iptables_bin=/var/gdn/assets/linux/sbin/iptables
worker  | + case "${ACTION}" in
worker  | + setup_filter
worker  | + teardown_filter
worker  | + teardown_deprecated_rules
worker  | ++ /var/gdn/assets/linux/sbin/iptables -w -S INPUT
worker  | + rules='-P INPUT ACCEPT'
worker  | + xargs --no-run-if-empty --max-lines=1 /var/gdn/assets/linux/sbin/iptables -w
worker  | xargs: unrecognized option '--max-lines=1'
worker  | + sed -e 's/--icmp-type any/--icmp-type 255\/255/'
worker  | + sed -e s/-A/-D/ -e 's/\s\+$//'
worker  | + grep ' -j garden-dispatch'
worker  | BusyBox v1.37.0 (2025-08-06 14:24:35 UTC)+ echo '-P INPUT ACCEPT'
worker  |  multi-call binary.
worker  |
worker  | Usage: xargs [OPTIONS] [PROG ARGS]
worker  |
worker  | Run PROG on every item given by stdin
worker  |
worker  |       -0      NUL terminated input
worker  |       -a FILE Read from FILE instead of stdin
worker  |       -o      Reopen stdin as /dev/tty
worker  |       -r      Don't run command if input is empty
worker  |       -t      Print the command on stderr before execution
worker  |       -p      Ask user whether to run each command
worker  |       -E STR,-e[STR]  STR stops input processing
worker  |       -I STR  Replace STR within PROG ARGS with input line
worker  |       -n N    Pass no more than N args to PROG
worker  |       -s N    Pass command line of no more than N bytes
worker  |       -P N    Run up to N PROGs in parallel
worker  |       -x      Exit if size is exceeded
worker  |
worker  | bulk starter: setting up default chains: iptables: setup-global-chains: + set -o nounset

Additional context

This was run on a legacy host (Ubuntu 20.04 LTS) but as you can see from the logs it's related to the container image.

Triaging info

  • Concourse version: 7.14.0
  • Browser (if applicable): Firefox 132esr
  • Did this used to work? : only on previous version

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions