Skip to content

SIGSEGV in dagger-engine on linux/amd64 after upgrade to go1.24 #9759

@jedevc

Description

@jedevc

I know me and @rajatjindal have started seeing this in v0.16.2, but @gmile in #8830 has also seen this.

Repeatedly running dagger function succeeds most of the times, but occasionally would fail with either of these two errors:

✘ load module 1.3s
! failed to get configured module: Post "http://dagger/query": unexpected EOF
│ ✘ finding module configuration 1.3s
│ ! Post "http://dagger/query": unexpected EOF
│ │ ∅ moduleSource(refString: "."): ModuleSource! 2.5s
✘ load module 1.9s
! failed to get configured module: Post "http://dagger/query": command [docker exec -i dagger-engine-v0.16.2 buildctl dial-stdio] has exited with exit status 137, make sure the URL is valid, and Docker 18.09 or later is installed on the remote host: stderr=
│ ✘ finding module configuration 1.9s
│ ! Post "http://dagger/query": command [docker exec -i dagger-engine-v0.16.2 buildctl dial-stdio] has exited with exit status 137, make sure the URL is valid, and Docker 18.09 or later is installed on the remote host: stderr=
│ │ ∅ moduleSource(refString: "."): ModuleSource! 3.1s
Error: failed to get configured module: Post "http://dagger/query": command [docker exec -i dagger-engine-v0.16.2 buildctl dial-stdio] has exited with exit status 137, make sure the URL is valid, and Docker 18.09 or later is installed on the remote host: stderr=

Essentially, we seem to get a full on SIGSEGV, on my machine I can get a coredump:

❯ sudo coredumpctl info
           PID: 2320 (dagger-engine)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 11 (SEGV)
     Timestamp: Mon 2025-03-03 13:02:21 GMT (20min ago)
  Command Line: /usr/local/bin/dagger-engine --config /etc/dagger/engine.toml --debug
    Executable: /usr/local/bin/dagger-engine
 Control Group: /system.slice/docker-78c867c4f854049c676bbc66e21ab369d526987de2d90ddb18b00e70804b4015.scope/init
          Unit: docker-78c867c4f854049c676bbc66e21ab369d526987de2d90ddb18b00e70804b4015.scope
         Slice: system.slice
       Boot ID: f564ae99bbda49528f6444c690454a84
    Machine ID: 03592b6b1309487b920cb5fcb6c06e31
      Hostname: 78c867c4f854
       Storage: /var/lib/systemd/coredump/core.dagger-engine.0.f564ae99bbda49528f6444c690454a84.2320.1741006941000000.zst (present)
  Size on Disk: 49.8M
       Message: Process 2320 (dagger-engine) of user 0 dumped core.
                
                Stack trace of thread 2648:
                #0  0x0000000000418f78 n/a (n/a + 0x0)
                #1  0x00000000004779d9 n/a (n/a + 0x0)
                #2  0x000000000047cc49 n/a (n/a + 0x0)
                #3  0x00000000004745b6 n/a (n/a + 0x0)
                #4  0x00000000004483c5 n/a (n/a + 0x0)
                #5  0x000000000044807f n/a (n/a + 0x0)
                ELF object binary architecture: AMD x86-64

Reading the coredump, I don't get anything particularly informative:

(dlv) bt
0  0x0000000000418f78 in runtime.mallocgcSmallNoscan
   at /usr/lib/go/src/runtime/malloc.go:1280
1  0x00000000004779d9 in runtime.mallocgc
   at :0
2  0x000000000047cc49 in runtime.growslice
   at :0
3  0x00000000004745b6 in runtime.vgetrandomPutState
   at :0
4  0x00000000004483c5 in runtime.mexit
   at :0
5  0x000000000044807f in runtime.mstart0
   at :0
   error: error while reading spliced memory at 0x8: EOF
(truncated)

(dlv) disassemble 
Sending output to pager...
TEXT runtime.mallocgcSmallNoscan(SB) /usr/lib/go/src/runtime/malloc.go
	malloc.go:1254	0x418ea0	493b6610		cmp rsp, qword ptr [r14+0x10]
	malloc.go:1254	0x418ea4	0f86d0020000		jbe 0x41917a
	malloc.go:1254	0x418eaa	55			push rbp
	malloc.go:1254	0x418eab	4889e5			mov rbp, rsp
	malloc.go:1254	0x418eae	4883ec48		sub rsp, 0x48

It looks like something memory-related in the go runtime is crashing, probably caused by our PR here #9673.

I've also seen this reported by the moby folks in moby/moby#49513 (comment) (potentially we both consume some logic that's good at getting this to happen?)

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/engineAbout dagger core enginekind/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions