-
Notifications
You must be signed in to change notification settings - Fork 784
Closed
Labels
area/engineAbout dagger core engineAbout dagger core enginekind/bugSomething isn't workingSomething isn't working
Description
I know me and @rajatjindal have started seeing this in v0.16.2, but @gmile in #8830 has also seen this.
Repeatedly running
dagger function
succeeds most of the times, but occasionally would fail with either of these two errors:✘ load module 1.3s ! failed to get configured module: Post "http://dagger/query": unexpected EOF │ ✘ finding module configuration 1.3s │ ! Post "http://dagger/query": unexpected EOF │ │ ∅ moduleSource(refString: "."): ModuleSource! 2.5s
✘ load module 1.9s ! failed to get configured module: Post "http://dagger/query": command [docker exec -i dagger-engine-v0.16.2 buildctl dial-stdio] has exited with exit status 137, make sure the URL is valid, and Docker 18.09 or later is installed on the remote host: stderr= │ ✘ finding module configuration 1.9s │ ! Post "http://dagger/query": command [docker exec -i dagger-engine-v0.16.2 buildctl dial-stdio] has exited with exit status 137, make sure the URL is valid, and Docker 18.09 or later is installed on the remote host: stderr= │ │ ∅ moduleSource(refString: "."): ModuleSource! 3.1s Error: failed to get configured module: Post "http://dagger/query": command [docker exec -i dagger-engine-v0.16.2 buildctl dial-stdio] has exited with exit status 137, make sure the URL is valid, and Docker 18.09 or later is installed on the remote host: stderr=
Essentially, we seem to get a full on SIGSEGV, on my machine I can get a coredump:
❯ sudo coredumpctl info
PID: 2320 (dagger-engine)
UID: 0 (root)
GID: 0 (root)
Signal: 11 (SEGV)
Timestamp: Mon 2025-03-03 13:02:21 GMT (20min ago)
Command Line: /usr/local/bin/dagger-engine --config /etc/dagger/engine.toml --debug
Executable: /usr/local/bin/dagger-engine
Control Group: /system.slice/docker-78c867c4f854049c676bbc66e21ab369d526987de2d90ddb18b00e70804b4015.scope/init
Unit: docker-78c867c4f854049c676bbc66e21ab369d526987de2d90ddb18b00e70804b4015.scope
Slice: system.slice
Boot ID: f564ae99bbda49528f6444c690454a84
Machine ID: 03592b6b1309487b920cb5fcb6c06e31
Hostname: 78c867c4f854
Storage: /var/lib/systemd/coredump/core.dagger-engine.0.f564ae99bbda49528f6444c690454a84.2320.1741006941000000.zst (present)
Size on Disk: 49.8M
Message: Process 2320 (dagger-engine) of user 0 dumped core.
Stack trace of thread 2648:
#0 0x0000000000418f78 n/a (n/a + 0x0)
#1 0x00000000004779d9 n/a (n/a + 0x0)
#2 0x000000000047cc49 n/a (n/a + 0x0)
#3 0x00000000004745b6 n/a (n/a + 0x0)
#4 0x00000000004483c5 n/a (n/a + 0x0)
#5 0x000000000044807f n/a (n/a + 0x0)
ELF object binary architecture: AMD x86-64
Reading the coredump, I don't get anything particularly informative:
(dlv) bt
0 0x0000000000418f78 in runtime.mallocgcSmallNoscan
at /usr/lib/go/src/runtime/malloc.go:1280
1 0x00000000004779d9 in runtime.mallocgc
at :0
2 0x000000000047cc49 in runtime.growslice
at :0
3 0x00000000004745b6 in runtime.vgetrandomPutState
at :0
4 0x00000000004483c5 in runtime.mexit
at :0
5 0x000000000044807f in runtime.mstart0
at :0
error: error while reading spliced memory at 0x8: EOF
(truncated)
(dlv) disassemble
Sending output to pager...
TEXT runtime.mallocgcSmallNoscan(SB) /usr/lib/go/src/runtime/malloc.go
malloc.go:1254 0x418ea0 493b6610 cmp rsp, qword ptr [r14+0x10]
malloc.go:1254 0x418ea4 0f86d0020000 jbe 0x41917a
malloc.go:1254 0x418eaa 55 push rbp
malloc.go:1254 0x418eab 4889e5 mov rbp, rsp
malloc.go:1254 0x418eae 4883ec48 sub rsp, 0x48
It looks like something memory-related in the go runtime is crashing, probably caused by our PR here #9673.
I've also seen this reported by the moby folks in moby/moby#49513 (comment) (potentially we both consume some logic that's good at getting this to happen?)
Metadata
Metadata
Assignees
Labels
area/engineAbout dagger core engineAbout dagger core enginekind/bugSomething isn't workingSomething isn't working