-
-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
I am using a combination of zig c++, golang and bazel to cross-compile a cgo program to Darwin. It compiles go stdlib in parallel and sometimes hangs. A container with a hung ps auxf
looks as follows:
root@4ba08cf5c898:/x# ps auxf | cat
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 5093 0.4 0.0 3968 3472 pts/1 Ss 04:03 0:00 bash
root 5190 0.0 0.0 6700 2952 pts/1 R+ 04:03 0:00 \_ ps auxf
root 5191 0.0 0.0 2468 516 pts/1 S+ 04:03 0:00 \_ cat
root 1 0.0 0.0 709728 17392 pts/0 Ssl+ Jun16 0:01 ./bazel build -s --platforms @zig_sdk//:aarch64-macos-gnu //test:gognu
root 2423 0.0 0.0 343176 22712 pts/0 Sl+ Jun16 0:12 /root/.cache/bazelisk/downloads/bazelbuild/bazel-4.1.0-linux-x86_64/bin/bazel build -s --platforms @zig_sdk//:aarch64-macos-gnu //test:gognu
root 2428 0.5 3.1 14181376 1017740 ? Ssl Jun16 2:11 bazel(x) -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9 --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED -Xverify:none -Djava.util.logging.config.file=/root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/javalog.properties -Dcom.google.devtools.build.lib.util.LogHandlerQuerier.class=com.google.devtools.build.lib.util.SimpleLogHandler$HandlerQuerier -XX:-MaxFDLimit -Djava.library.path=/root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220/embedded_tools/jdk/lib/jli:/root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220/embedded_tools/jdk/lib:/root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220/embedded_tools/jdk/lib/server:/root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220/ -Dfile.encoding=ISO-8859-1 -jar /root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220/A-server.jar --max_idle_secs=10800 --noshutdown_on_low_sys_mem --connect_timeout_secs=30 --output_user_root=/root/.cache/bazel/_bazel_root --install_base=/root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220 --install_md5=f95ca91ebc34d56aa0f8ad499de91220 --output_base=/root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9 --workspace_directory=/x --default_system_javabase= --failure_detail_out=/root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/failure_detail.rawproto --expand_configs_in_place --idle_server_tasks --write_command_log --nowatchfs --nofatal_event_bus_exceptions --nowindows_enable_symlinks --client_debug=false --product_name=Bazel --noincompatible_enable_execution_transition --option_sources=
root 2709 0.0 0.0 5796 1632 ? S Jun16 0:00 \_ /root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220/process-wrapper --timeout=0 --kill_delay=15 bazel-out/host/bin/external/go_sdk/builder stdlib -sdk external/go_sdk -installsuffix darwin_arm64 -out bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/stdlib_
root 2710 0.0 0.0 707676 13572 ? Ssl Jun16 0:01 \_ bazel-out/host/bin/external/go_sdk/builder stdlib -sdk external/go_sdk -installsuffix darwin_arm64 -out bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/stdlib_
root 2715 0.0 0.0 1159684 31992 ? Sl Jun16 0:03 \_ external/go_sdk/bin/go install -toolexec /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/sandbox/processwrapper-sandbox/3/execroot/bazel-zig-cc/bazel-out/host/bin/external/go_sdk/builder filterbuildid -gcflags=all= -ldflags=all=-trimpath /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/sandbox/processwrapper-sandbox/3/execroot/bazel-zig-cc -asmflags=all=-trimpath /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/sandbox/processwrapper-sandbox/3/execroot/bazel-zig-cc/bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/stdlib_ std runtime/cgo
root 5087 0.0 0.0 152932 30948 ? Sl Jun16 0:00 \_ /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/external/zig_sdk/zig c++ -I /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/sandbox/processwrapper-sandbox/3/execroot/bazel-zig-cc/bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/stdlib_/src/net -fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2201875783/b087=/tmp/go-build -gno-record-gcc-switches -fno-common -o /tmp/go-build2201875783/b087/_cgo_.o /tmp/go-build2201875783/b087/_cgo_main.o /tmp/go-build2201875783/b087/_x001.o /tmp/go-build2201875783/b087/_x002.o /tmp/go-build2201875783/b087/_x003.o /tmp/go-build2201875783/b087/_x004.o /tmp/go-build2201875783/b087/_x005.o -target aarch64-macos-gnu
root 5088 0.0 0.0 152932 31092 ? Sl Jun16 0:00 \_ /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/external/zig_sdk/zig c++ -I /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/sandbox/processwrapper-sandbox/3/execroot/bazel-zig-cc/bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/stdlib_/src/os/user -fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2201875783/b036=/tmp/go-build -gno-record-gcc-switches -fno-common -o /tmp/go-build2201875783/b036/_cgo_.o /tmp/go-build2201875783/b036/_cgo_main.o /tmp/go-build2201875783/b036/_x001.o /tmp/go-build2201875783/b036/_x002.o /tmp/go-build2201875783/b036/_x003.o /tmp/go-build2201875783/b036/_x004.o -target aarch64-macos-gnu
Both are waiting on some lock (pids are different, since I am stracing outside the container):
motiejus ~/code/bazel-zig-cc $ sudo strace -p 812414
strace: Process 812414 attached
futex(0x7f3743028b70, FUTEX_WAIT_PRIVATE, 4294967295, NULL^Cstrace: Process 812414 detached
<detached ...>
motiejus ~/code/bazel-zig-cc $ sudo strace -p 812415
strace: Process 812415 attached
futex(0x7f12064f8b70, FUTEX_WAIT_PRIVATE, 4294967295, NULL^Cstrace: Process 812415 detached
<detached ...>
kill -USR1
did not produce a stack trace. Is there any more information I can provide? Steps to reproduce on a x86_64-linux machine with a working docker installation:
$ git clone https://git.sr.ht/~motiejus/bazel-zig-cc -b hangzig
$ cd bazel-zig-cc
$ for i in $(seq 1000); do date; echo $i; time ./hangzig; done
It fails more often in builds.sr.ht (therefore the test script has --cpuset-cpus=0-1
, because builds.sr.ht allocates 2 CPUs), e.g. https://builds.sr.ht/~motiejus/job/526372. On my laptop it failed on the 15'th iteration, an iteration is ~90 seconds.
zig version: 0.9.0-dev.137+86ebd4b97. I know bazel in the loop is cumbersome, but I wasn't able to find an easy way to reproduce it without it.