Skip to content

Switch to GitHub Actions #1736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from
Closed

Conversation

crazy-max
Copy link
Member

@crazy-max crazy-max commented Oct 17, 2020

Closes #1729

Switch to GitHub Actions. ATM it does not use the "hack" scripts. Everything is done through the workflows.

Build time is slightly improved using local GitHub Cache ATM:

Without release tag:

With release tag:

TODO

  • Review "hack" scripts
  • Use Docker actions instead of buildctl
  • Use GitHub Cache instead of custom CI caching sticked to travis
  • Build stage
    • binaries build stage
    • integration-tests-base build stage
  • Client integration tests
    • ./client pkg
    • ./cmd/buildctl pkg
    • ./worker/containerd pkg
  • Unit Tests & Lint & Vendor & Proto
    • Lint
    • test ./... pkg (moved to integration-tests job for build time improvement through matrix)
    • validate-vendor
    • validate-generated-files
    • validate-shfmt
    • Run unit tests on all platforms and upload coverage on codecov
      • Windows
      • Fix error on Linux runner. Looks like permission issues with the runner.
      • Fix error on MacOS runner: build github.com/moby/buildkit/util/entitlements/security: cannot load github.com/moby/buildkit/util/entitlements/security: no Go source files
  • Dockerfile integration tests
    • test ./frontend/dockerfile pkg (moved to integration-tests job for build time improvement through matrix)
  • External Dockerfile tests
  • Cross
  • Build image (merged with Deploy image)
  • Deploy
    • ./hack/images master $REPO_SLUG_TARGET push
      • Merged with Build image to reduce build time
      • Use Build push action (buildx)
      • Standard and rootless image built in //
    • ./hack/release-tar $TRAVIS_TAG release-out
      • Upload artifacts locally through actions/upload-artifact
      • Use GitHub Release action to push artifacts on GitHub
    • ./frontend/dockerfile/cmd/dockerfile-frontend/hack/release master mainline $DF_REPO_SLUG_TARGET push
    • ./frontend/dockerfile/cmd/dockerfile-frontend/hack/release master experimental $DF_REPO_SLUG_TARGET push
    • ./frontend/dockerfile/cmd/dockerfile-frontend/hack/release tag $TRAVIS_TAG $DF_REPO_SLUG_TARGET push
    • ./frontend/dockerfile/cmd/dockerfile-frontend/hack/release daily _ $DF_REPO_SLUG_TARGET push

cc. @tonistiigi @coryb

Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Add lint and validator tests

Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
@crazy-max crazy-max requested review from coryb and tonistiigi October 17, 2020 14:29
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
@crazy-max
Copy link
Member Author

Sometime this head request fails. Looks random, maybe some network issue:

shell: /bin/bash -e {0}
  env:
    REPO_SLUG_ORIGIN: moby/buildkit
    LINUXKIT_BINFMT: v0.8
    BUILDKIT_HOST: tcp://0.0.0.0:1234
    BUILDKIT_PORT: 1234
    PLATFORMS: linux/amd64,linux/arm/v7,linux/arm64,linux/s390x,linux/ppc64le
    PREFER_BUILDCTL: 1
    PROGRESS: plain
    IID: buildkit-tests
    IID_FILE: /tmp/docker-iidfile
    OCI_FILE: /tmp/docker-ocifile
    VOL: /tmp/.buildkit-cache-test
buildctl build --progress $PROGRESS --frontend=dockerfile.v0 \
    --local context=. --local dockerfile=. \
    --opt target=integration-tests \
    --import-cache type=local,src=/tmp/.buildkit-cache/integration-tests \
    --output type=docker,name=$IID,dest=$IID_FILE
#1 [internal] load .dockerignore
#1 transferring context: 56B done
#1 DONE 0.0s

#2 [internal] load build definition from Dockerfile
#2 transferring dockerfile: 12.90kB done
#2 DONE 0.0s

#3 resolve image config for docker.io/docker/dockerfile:1.1-experimental
#3 ERROR: failed to do request: Head https://registry-1.docker.io/v2/docker/dockerfile/manifests/1.1-experimental: EOF
------
 > resolve image config for docker.io/docker/dockerfile:1.1-experimental:
------
error: failed to solve: rpc error: code = Unknown desc = failed to solve with frontend dockerfile.v0: failed to solve with frontend gateway.v0: failed to do request: Head https://registry-1.docker.io/v2/docker/dockerfile/manifests/1.1-experimental: EOF
Error: Process completed with exit code 1.

Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Comment on lines +124 to +125
pkg: ./frontend/dockerfile
flags: -v --parallel=6 --timeout=20m
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is not related to the GH migration, but this test seems like the slowest test, I wonder if we could get better parallelization if we break them out by each buildkit worker and let GH parallelize. Something like:

- pkg: ./frontend/dockerfile
  flags: -v --parallel=6 --timeout=20m -run=".*/.*/worker=containerd$"
  skip-it: 0
- pkg: ./frontend/dockerfile
  flags: -v --parallel=6 --timeout=20m -run=".*/.*/worker=containerd-1.3$"
  skip-it: 0
- pkg: ./frontend/dockerfile
  flags: -v --parallel=6 --timeout=20m -run=".*/.*/worker=containerd-snapshotter-stargz$"
  skip-it: 0
- pkg: ./frontend/dockerfile
  flags: -v --parallel=6 --timeout=20m -run=".*/.*/worker=oci$"
  skip-it: 0
- pkg: ./frontend/dockerfile
  flags: -v --parallel=6 --timeout=20m -run=".*/.*/worker=oci-rootless$"
  skip-it: 0
- pkg: ./frontend/dockerfile
  flags: -v --parallel=6 --timeout=20m -run=".*/.*/worker=oci-snapshotter-stargz$"
  skip-it: 0

Not sure if there is a more graceful way to add the list of workers to the matrix. A similar hack could be applied to all the other packages that run via workers. If GH can spin up parallel containers for each test I suspect it will reduce the test duration by ~6x.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we can add our own matrix options, so was thinking something like this:

matrix:
  pkg:
    - ./client
    - ./cmd/buildctl
    - ./worker/containerd
    - ./frontend
    - ./frontend/dockerfile
  worker: 
    - containerd
    - containerd-1.3
    - containerd-snapshotter-stargz
    - oci
    - oci-rootless
    - oci-snapshotter-stargz
  include:
    - pkg: ./client
      flags: --timeout=20m
    - pkg: ./frontend/dockerfile
      flags: --timeout=20m
steps:
  -  name: Test ${{matrix.worker}}:${{ matrix.pkg }}
     run: |
          ...
          cid=$(docker create --rm -v /tmp --volumes-from=$cacheVolume -e TEST_DOCKERD -e SKIP_INTEGRATION_TESTS=0 -e BUILDKIT_REGISTRY_MIRROR_DIR=$VOL/registry --privileged $IID go test -v -run=".*/.*/worker=${{matrix.worker}}$" ${{ matrix.flags }} ${{ matrix.pkg }})
          ...

We only need to specify the include for the exceptions where we want to add custom flags. This drops the ./... non-integration tests, it seems to me those should be run separately outside of the "integration test" matrix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your input I will make some tests around this.

@crazy-max crazy-max force-pushed the github-actions branch 6 times, most recently from 73d5589 to 769ec43 Compare October 18, 2020 21:15
Add build workflow

Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
@crazy-max
Copy link
Member Author

I have created the build workflow and the docker image is now built using our Docker action. Artifacts are upload locally through actions/upload-artifact GitHub action and release on GitHub is done through GitHub Release action.

image

Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
@crazy-max crazy-max force-pushed the github-actions branch 5 times, most recently from 0449df4 to 86928d2 Compare October 19, 2020 09:43
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
@crazy-max crazy-max force-pushed the github-actions branch 6 times, most recently from cc6e673 to 9010bec Compare October 20, 2020 21:27
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
@crazy-max
Copy link
Member Author

Now workflows only use our Docker actions instead of buildctl and dockerfile-frontend workflow has been added.

@tonistiigi
Copy link
Member

Leaving summary here, mostly what we discussed in slack already. Again, apologies for not reacting sooner.

I think we should start this work with refactoring the scripts in ./hack directory. They were created before buildx (therefore used buildctl where full BuildKit was needed, eg. multi-platform), and needed to work without BuildKit in Docker as well as BuildKit was just being developed and could not be expected to be installed. The latter we have gradually already removed(we used to have duplicate Dockerfiles) but still exists in hack scripts. It doesn't make sense to migrate this old buildctl logic into new CI.

We should replace it with buildx. Maybe it would be good if main flows also worked with DOCKER_BUILDKIT=1 docker build . but not high priority. Some flows that work well with buildx bake may start using it. For example, vendor, lint, code generation. This is not a requirement, but where it is as easy as refactoring the script we can just skip the script part. In https://github.com/tonistiigi/fsutil I've already written some of the similar targets for a go project.

The goal is to have the dev flow covered with containerized actions that can be run without a special host setup, without much performance overhead. That means that we should avoid docker/build-push-action@v2 in here as that duplicates logic from the steps that can be invoked locally. We can use other actions, of course, setup-buildx, login etc. CI should invoke the same containerized flows that are provided for dev. An advanced user may use their local setup as well if they know how to, eg. go test works fine in this repo and doesn't require a special invocation wrapper.

The above also means that we shouldn't have things using actions/setup-go etc. to install host dependencies. This is also so that we dogfood our own tools and see if there are disadvantages and can improve them.

We should build from git. This improves caching precision and makes sure there are no leaked files.

Regarding caching, as we discussed, it looks like it is possible to use the keys and scope of actions/cache to cache both master and PRs in a similar manner as the current Travis cache, without a special token server and registry. The way the caching works currently is that there are some sequential steps that run and export cache. And then the rest of the steps run in parallel (eg the test matrix) without rebuilding things again in separate vms. As long as buildkit doesn't support fully distributed mode we should replicate this logic and chain steps. Also, no need to test if build fails, or try to build release images if tests fail. Not sure if splitting the workflow files makes sense.

For the cache, using type=local and compressing the directory isn't optimal but seems to be good enough based on your timings. To use a registry backend for cache, we would need to extend https://github.com/tonistiigi/ci-token-server what probably doesn't make sense. Possibly we can make a custom backend for Github cache in the future or something better. The worst part about the current cache is that it can only be used in the CI and not locally, unlike the registry cache that I can use to build locally using the cache from master CI.

We don't need to do this all at once. Can just pick one section, fix the ./hack part, then add GH workflow invocation. We can run Travis and GH actions in parallel for some migration time. Or some steps in travis and other in GHA.

@crazy-max
Copy link
Member Author

crazy-max commented Oct 21, 2020

@tonistiigi

Some flows that work well with buildx bake may start using it.

Maybe we could take a look at https://github.com/crazy-max/ghaction-docker-buildx-bake and move it to docker org?

fix the ./hack part

I will work on it and let you know about my progress.

We can run Travis and GH actions in parallel for some migration time.

Yes that's what I thought when I started to work on it.

Feel free to close this PR and thanks for your report!

Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
@crazy-max
Copy link
Member Author

Closing this one in favor of #1843

@crazy-max crazy-max closed this Nov 26, 2020
@crazy-max crazy-max deleted the github-actions branch November 26, 2020 05:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Migration] travis-ci.org is moving to travis-ci.com
3 participants