-
-
Notifications
You must be signed in to change notification settings - Fork 112
Description
What happened?
I'm using bazel on a darwin/arm64 host, pointing at a linux/amd64 RBE cluster. copy_to_directory
fails with "cannot execute binary file" because bazel is trying to run a linux/amd64 binary locally.
Here is one example failure, where copy_to_directory
is invoked as part of rules_oci
's oci_pull
ERROR: /private/var/tmp/_bazel_gus/798cdab9ca916cc0b6f69f7c876590eb/external/distroless_static_linux_arm64/BUILD.bazel:47:18: Copying files to directory distroless_static_linux_arm64/blobs/sha256 failed: (Exit 126): copy_to_directory failed: error executing command (from target @distroless_static_linux_arm64//:blobs)
(cd /private/var/tmp/_bazel_gus/798cdab9ca916cc0b6f69f7c876590eb/execroot/com_canva_infrastructure && \
exec env - \
external/copy_to_directory_linux_amd64/copy_to_directory bazel-out/darwin_arm64-fastbuild/bin/external/distroless_static_linux_arm64/blobs_config.json)
# Configuration: 552dda68697b5bb17c41b4bef2af5919b102e5481624a5bb19489b4de2a07a0c
# Execution platform: //tools/build/bazel/toolchains/remote:ubuntu-act-22-04-platform
external/copy_to_directory_linux_amd64/copy_to_directory: external/copy_to_directory_linux_amd64/copy_to_directory: cannot execute binary file
Note bazel thinks the execution platform is my RBE platform (ubuntu-act-22-04-platform), and has selected copy_to_directory_linux_amd64
appropriately, but the action is actually executed locally on my darwin/amd64 mac and fails ("Exit 126" and "cannot execute binary file").
After much head scratching, I discovered @aspect_bazel_lib//lib/private/copy_common.bzl
COPY_EXECUTION_REQUIREMENTS
, which forces copy commands to be performed locally. This effectively discards all the toolchain resolution hard work, and results in the above error.
Version
Development (host) and target OS/architectures:
host = darwin/arm64
exec = linux/amd64
target = linux/amd64
Output of bazel --version
:
bazel 6.1.2
Version of the Aspect rules, or other relevant rules from your
WORKSPACE
or MODULE.bazel
file:
Aspect rules v1.32.1
Language(s) and/or frameworks involved:
How to reproduce
No response
Any other information?
I think the fix is either:
- Remove all of COPY_EXECUTION_REQUIREMENTS. This is what I've done locally. I appreciate the optimisation goal, but these are dubious at best with a sufficiently large+fast RBE cache, remote asset API, and a slow network link from my client to RBE cluster. If performance is an issue, it's certainly easier in my case to expand the size of CAS disk rather than expand any other part of the system. I have not confirmed if the caveat about src treeartifacts not working over remote-exec api still applies, but it hasn't affected my use cases in ways that I've noticed .. yet.
- Not use toolchain resolution for copy_to_directory. If we're forcing it to execute locally, then we also want to force it to use the host platform's copy_to_directory executable.
- Lean further into local execution, and add exec_compatible_with=HOST_CONSTRAINTS on all relevant rules (or actions, with exec_groups)