Skip to content

zig c++ hanging when invoked in parallel #9139

@motiejus

Description

@motiejus

I am using a combination of zig c++, golang and bazel to cross-compile a cgo program to Darwin. It compiles go stdlib in parallel and sometimes hangs. A container with a hung ps auxf looks as follows:

root@4ba08cf5c898:/x# ps auxf | cat
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root        5093  0.4  0.0   3968  3472 pts/1    Ss   04:03   0:00 bash
root        5190  0.0  0.0   6700  2952 pts/1    R+   04:03   0:00  \_ ps auxf
root        5191  0.0  0.0   2468   516 pts/1    S+   04:03   0:00  \_ cat
root           1  0.0  0.0 709728 17392 pts/0    Ssl+ Jun16   0:01 ./bazel build -s --platforms @zig_sdk//:aarch64-macos-gnu //test:gognu
root        2423  0.0  0.0 343176 22712 pts/0    Sl+  Jun16   0:12 /root/.cache/bazelisk/downloads/bazelbuild/bazel-4.1.0-linux-x86_64/bin/bazel build -s --platforms @zig_sdk//:aarch64-macos-gnu //test:gognu
root        2428  0.5  3.1 14181376 1017740 ?    Ssl  Jun16   2:11 bazel(x) -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9 --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED -Xverify:none -Djava.util.logging.config.file=/root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/javalog.properties -Dcom.google.devtools.build.lib.util.LogHandlerQuerier.class=com.google.devtools.build.lib.util.SimpleLogHandler$HandlerQuerier -XX:-MaxFDLimit -Djava.library.path=/root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220/embedded_tools/jdk/lib/jli:/root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220/embedded_tools/jdk/lib:/root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220/embedded_tools/jdk/lib/server:/root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220/ -Dfile.encoding=ISO-8859-1 -jar /root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220/A-server.jar --max_idle_secs=10800 --noshutdown_on_low_sys_mem --connect_timeout_secs=30 --output_user_root=/root/.cache/bazel/_bazel_root --install_base=/root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220 --install_md5=f95ca91ebc34d56aa0f8ad499de91220 --output_base=/root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9 --workspace_directory=/x --default_system_javabase= --failure_detail_out=/root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/failure_detail.rawproto --expand_configs_in_place --idle_server_tasks --write_command_log --nowatchfs --nofatal_event_bus_exceptions --nowindows_enable_symlinks --client_debug=false --product_name=Bazel --noincompatible_enable_execution_transition --option_sources=
root        2709  0.0  0.0   5796  1632 ?        S    Jun16   0:00  \_ /root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220/process-wrapper --timeout=0 --kill_delay=15 bazel-out/host/bin/external/go_sdk/builder stdlib -sdk external/go_sdk -installsuffix darwin_arm64 -out bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/stdlib_
root        2710  0.0  0.0 707676 13572 ?        Ssl  Jun16   0:01      \_ bazel-out/host/bin/external/go_sdk/builder stdlib -sdk external/go_sdk -installsuffix darwin_arm64 -out bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/stdlib_
root        2715  0.0  0.0 1159684 31992 ?       Sl   Jun16   0:03          \_ external/go_sdk/bin/go install -toolexec /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/sandbox/processwrapper-sandbox/3/execroot/bazel-zig-cc/bazel-out/host/bin/external/go_sdk/builder filterbuildid -gcflags=all= -ldflags=all=-trimpath /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/sandbox/processwrapper-sandbox/3/execroot/bazel-zig-cc -asmflags=all=-trimpath /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/sandbox/processwrapper-sandbox/3/execroot/bazel-zig-cc/bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/stdlib_ std runtime/cgo
root        5087  0.0  0.0 152932 30948 ?        Sl   Jun16   0:00              \_ /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/external/zig_sdk/zig c++ -I /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/sandbox/processwrapper-sandbox/3/execroot/bazel-zig-cc/bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/stdlib_/src/net -fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2201875783/b087=/tmp/go-build -gno-record-gcc-switches -fno-common -o /tmp/go-build2201875783/b087/_cgo_.o /tmp/go-build2201875783/b087/_cgo_main.o /tmp/go-build2201875783/b087/_x001.o /tmp/go-build2201875783/b087/_x002.o /tmp/go-build2201875783/b087/_x003.o /tmp/go-build2201875783/b087/_x004.o /tmp/go-build2201875783/b087/_x005.o -target aarch64-macos-gnu
root        5088  0.0  0.0 152932 31092 ?        Sl   Jun16   0:00              \_ /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/external/zig_sdk/zig c++ -I /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/sandbox/processwrapper-sandbox/3/execroot/bazel-zig-cc/bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/stdlib_/src/os/user -fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2201875783/b036=/tmp/go-build -gno-record-gcc-switches -fno-common -o /tmp/go-build2201875783/b036/_cgo_.o /tmp/go-build2201875783/b036/_cgo_main.o /tmp/go-build2201875783/b036/_x001.o /tmp/go-build2201875783/b036/_x002.o /tmp/go-build2201875783/b036/_x003.o /tmp/go-build2201875783/b036/_x004.o -target aarch64-macos-gnu

Both are waiting on some lock (pids are different, since I am stracing outside the container):

motiejus ~/code/bazel-zig-cc $ sudo strace -p 812414
strace: Process 812414 attached
futex(0x7f3743028b70, FUTEX_WAIT_PRIVATE, 4294967295, NULL^Cstrace: Process 812414 detached
 <detached ...>

motiejus ~/code/bazel-zig-cc $ sudo strace -p 812415
strace: Process 812415 attached
futex(0x7f12064f8b70, FUTEX_WAIT_PRIVATE, 4294967295, NULL^Cstrace: Process 812415 detached
 <detached ...>

kill -USR1 did not produce a stack trace. Is there any more information I can provide? Steps to reproduce on a x86_64-linux machine with a working docker installation:

$ git clone https://git.sr.ht/~motiejus/bazel-zig-cc -b hangzig
$ cd bazel-zig-cc
$ for i in $(seq 1000); do date; echo $i; time ./hangzig; done

It fails more often in builds.sr.ht (therefore the test script has --cpuset-cpus=0-1, because builds.sr.ht allocates 2 CPUs), e.g. https://builds.sr.ht/~motiejus/job/526372. On my laptop it failed on the 15'th iteration, an iteration is ~90 seconds.

zig version: 0.9.0-dev.137+86ebd4b97. I know bazel in the loop is cumbersome, but I wasn't able to find an easy way to reproduce it without it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugObserved behavior contradicts documented or intended behaviorfrontendTokenization, parsing, AstGen, Sema, and Liveness.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions