-
Notifications
You must be signed in to change notification settings - Fork 346
Closed
Labels
Description
Cloud provider: gcp
SCM / CICD: github
/ actions
I have been playing with short --idle-timeout
values and I was a bit baffled when I encountered this. One workflow shutdown earlier before the GitHub action completes the final job, and the other completes as expected, but they are using nearly identical workflows.
Successful workflow:
name: Build and run Inference - DEV
on:
push:
branches: [dev]
jobs:
build:
environment: dev
runs-on: ubuntu-latest
steps:
- uses: dacbd/gcr-build-push@main
with:
tags: latest
project-id: ${{ secrets.GCP_PROJECT_ID }}
GCR-key: ${{ secrets.GCP_GCR_KEY }}
container-name: ***
deploy-runner:
runs-on: ubuntu-latest
environment: dev
steps:
- uses: iterative/setup-cml@v1
- uses: actions/checkout@v2
- name: Deploy runner on GCP
env:
GOOGLE_APPLICATION_CREDENTIALS_DATA: ${{ secrets.GCP_CML_RUNNER_KEY }}
run: |
cml-runner \
--single \
--idle-timeout=360 \
--token=${{ secrets.PAT_DCB }} \
--cloud=gcp \
--cloud-region=us-west \
--cloud-type=e2-standard-16
run-inference:
needs: [build, deploy-runner]
runs-on: [self-hosted, cml]
environment: dev
steps:
- uses: actions/checkout@v2
- name: Setup Container image name
run: echo "ML_CONTAINER=gcr.io/${{ secrets.GCP_PROJECT_ID }}/***" >> "${GITHUB_ENV}"
- name: Docker GCR Login
uses: docker/login-action@v1
with:
registry: gcr.io
username: _json_key
password: ${{ secrets.GCP_GCR_KEY }}
- name: Load container from GCR
run: docker pull ${{ env.ML_CONTAINER }}:latest
- name: Run Inference
run: |
docker run ****
Failed workflow:
name: Build and run Inference - PROD
on:
schedule: # Saturday at 1400 UTC
- cron: '0 14 * * 6'
push:
branches: [prod]
jobs:
build:
environment: prod
runs-on: ubuntu-latest
steps:
- uses: dacbd/gcr-build-push@v1
with:
tags: latest
project-id: ${{ secrets.GCP_PROJECT_ID }}
GCR-key: ${{ secrets.GCP_GCR_KEY }}
container-name: ***
deploy-runner:
runs-on: ubuntu-latest
environment: prod
steps:
- uses: iterative/setup-cml@v1
- uses: actions/checkout@v2
- name: Deploy runner on GCP
env:
GOOGLE_APPLICATION_CREDENTIALS_DATA: ${{ secrets.GCP_CML_RUNNER_KEY }}
run: |
cml-runner \
--single \
--idle-timeout=360 \
--token=${{ secrets.PAT_DCB }} \
--cloud=gcp \
--cloud-region=us-west \
--cloud-type=e2-standard-16
run-inference:
needs: [build, deploy-runner]
runs-on: [self-hosted, cml]
environment: prod
steps:
- uses: actions/checkout@v2
- name: Setup Container image name
run: echo "ML_CONTAINER=gcr.io/${{ secrets.GCP_PROJECT_ID }}/***" >> "${GITHUB_ENV}"
- name: Docker GCR Login
uses: docker/login-action@v1
with:
registry: gcr.io
username: _json_key
password: ${{ secrets.GCP_GCR_KEY }}
- name: Load container from GCR
run: docker pull ${{ env.ML_CONTAINER }}
- name: Run Inference
run: |
docker run ***
Successful cml.service
log
daniel_barnes@cml-z53bz3oz1t:~$ sudo journalctl -u cml.service -f
-- Logs begin at Tue 2021-11-09 20:03:04 UTC. --
Nov 09 20:05:47 cml-z53bz3oz1t systemd[1]: Started cml.service.
Nov 09 20:05:52 cml-z53bz3oz1t cml.sh[17039]: {"level":"info","message":"Preparing workdir /tmp/tmp.7PolL2jgVJ/.cml/cml-z53bz3oz1t..."}
Nov 09 20:05:52 cml-z53bz3oz1t cml.sh[17039]: {"level":"info","message":"Launching github runner"}
Nov 09 20:06:02 cml-z53bz3oz1t cml.sh[17039]: {"level":"warn","message":"SpotNotifier can not be started."}
Nov 09 20:06:03 cml-z53bz3oz1t cml.sh[17039]: {"date":"2021-11-09T20:06:03.618Z","level":"info","message":"runner status","repo":"https://github.com/yyyy/xxxx"}
Nov 09 20:06:03 cml-z53bz3oz1t cml.sh[17039]: {"date":"2021-11-09T20:06:03.619Z","level":"info","message":"runner status √ Connected to GitHub","repo":"https://github.com/yyyy/xxxx"}
Nov 09 20:06:04 cml-z53bz3oz1t cml.sh[17039]: {"date":"2021-11-09T20:06:04.057Z","level":"info","message":"runner status Listening for Jobs","repo":"https://github.com/yyyy/xxxx","status":"ready"}
Nov 09 20:06:16 cml-z53bz3oz1t cml.sh[17039]: {"date":"2021-11-09T20:06:15.944Z","job":4157205540,"level":"info","message":"runner status Running job: run-inference","repo":"https://github.com/yyyy/xxxx","status":"job_started"}
Nov 09 20:18:21 cml-z53bz3oz1t cml.sh[17039]: {"date":"2021-11-09T20:18:21.355Z","job":"","level":"info","message":"runner status Job run-inference completed with result: Succeeded","repo":"https://github.com/yyyy/xxxx","status":"job_ended","success":true}
Nov 09 20:18:21 cml-z53bz3oz1t cml.sh[17039]: {"date":"2021-11-09T20:18:21.595Z","level":"info","message":"runner status √ Removed .credentials","repo":"https://github.com/yyyy/xxxx"}
Nov 09 20:18:21 cml-z53bz3oz1t cml.sh[17039]: {"date":"2021-11-09T20:18:21.596Z","level":"info","message":"runner status √ Removed .runner","repo":"https://github.com/yyyy/xxxx"}
Nov 09 20:18:21 cml-z53bz3oz1t cml.sh[17039]: {"level":"info","message":"runner status","reason":"proc_exit","status":"terminated"}
Nov 09 20:18:21 cml-z53bz3oz1t cml.sh[17039]: {"level":"info","message":"waiting 20 seconds before exiting..."}
Nov 09 20:18:41 cml-z53bz3oz1t cml.sh[17039]: {"level":"info","message":"Unregistering runner cml-z53bz3oz1t..."}
Nov 09 20:18:41 cml-z53bz3oz1t cml.sh[17039]: {"level":"error","message":"\tFailed: Cannot destructure property 'id' of '(intermediate value)' as it is undefined."}
Nov 09 20:18:44 cml-z53bz3oz1t systemd[1]: cml.service: Succeeded.
Failed cml.service
log
daniel_barnes@cml-uawryesbyo:~$ sudo journalctl -u cml.service -f
-- Logs begin at Tue 2021-11-09 19:23:49 UTC. --
Nov 09 19:26:34 cml-uawryesbyo systemd[1]: Started cml.service.
Nov 09 19:26:39 cml-uawryesbyo cml.sh[17135]: {"level":"info","message":"Preparing workdir /tmp/tmp.TAz3gTOugR/.cml/cml-uawryesbyo..."}
Nov 09 19:26:39 cml-uawryesbyo cml.sh[17135]: {"level":"info","message":"Launching github runner"}
Nov 09 19:26:49 cml-uawryesbyo cml.sh[17135]: {"level":"warn","message":"SpotNotifier can not be started."}
Nov 09 19:26:50 cml-uawryesbyo cml.sh[17135]: {"date":"2021-11-09T19:26:50.163Z","level":"info","message":"runner status","repo":"https://github.com/yyyy/xxxx"}
Nov 09 19:26:50 cml-uawryesbyo cml.sh[17135]: {"date":"2021-11-09T19:26:50.163Z","level":"info","message":"runner status √ Connected to GitHub","repo":"https://github.com/yyyy/xxxx"}
Nov 09 19:26:50 cml-uawryesbyo cml.sh[17135]: {"date":"2021-11-09T19:26:50.583Z","level":"info","message":"runner status Listening for Jobs","repo":"https://github.com/yyyy/xxxx","status":"ready"}
Nov 09 19:27:02 cml-uawryesbyo cml.sh[17135]: {"level":"warn","message":"Failed parsing log: Reduce of empty array with no initial value"}
Nov 09 19:27:02 cml-uawryesbyo cml.sh[17135]: {"level":"warn","message":"Original log bytes, as Base64: 2021-11-09 19:27:01Z: Running job: run-inference\n"}
Nov 09 19:32:51 cml-uawryesbyo cml.sh[17135]: {"level":"info","message":"runner status","reason":"timeout:360","status":"terminated"}
Nov 09 19:32:51 cml-uawryesbyo cml.sh[17135]: {"level":"info","message":"waiting 20 seconds before exiting..."}
Nov 09 19:33:11 cml-uawryesbyo cml.sh[17135]: {"level":"info","message":"Unregistering runner cml-uawryesbyo..."}
Nov 09 19:33:12 cml-uawryesbyo cml.sh[17135]: {"level":"error","message":"\tFailed: Bad request - Runner \"cml-uawryesbyo\" is still running a job\""}
Nov 09 19:33:22 cml-uawryesbyo systemd[1]: cml.service: Succeeded.
DavidGOrtega