Skip to content

Concurrent access of CT map might lead to early closure of CT map while doing GC? #37383

@aanm

Description

@aanm

This race condition was caught by golang's race detector.

After analyzing the code a little bit it might look that the m.Close done on line 858:

for _, m := range allMaps {
path, err := OpenCTMap(m)
if err != nil {
msg := "Skipping CT map pressure calculation"
scopedLog := log.WithError(err).WithField(logfields.Path, path)
if os.IsNotExist(err) {
scopedLog.Debug(msg)
} else {
scopedLog.Warn(msg)
}
continue
}
defer m.Close()

Could close the fd making the DumpReliablyWithCallback to fail while iterating the same map.

func (m *Map) DumpReliablyWithCallback(cb DumpCallback, stats *DumpStats) error {

The following stack trace shows this happening when doing a GetModel, which happens on a cilium sysdump for example, but I believe the same can theoretically happen while doing a GC of the CT map since the CalculateCTMapPressure controller runs every 30 seconds and if the GC takes more than 30 then the cleanup will be interrupted in the middle.

I didn't find any locking mechanism to prevent this from happening.

2025-01-30T21:27:22.005325276Z Read at 0x00c0099535a8 by goroutine 6096710:
2025-01-30T21:27:22.005373476Z   github.com/cilium/ebpf/internal/sys.(*FD).Uint()
2025-01-30T21:27:22.005379677Z       /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/internal/sys/fd.go:71 +0x144
2025-01-30T21:27:22.005446912Z   github.com/cilium/ebpf.(*Map).nextKey()
2025-01-30T21:27:22.005454006Z       /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/map.go:993 +0x112
2025-01-30T21:27:22.005553101Z   github.com/cilium/ebpf.(*Map).NextKey()
2025-01-30T21:27:22.005560745Z       /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/map.go:950 +0xae
2025-01-30T21:27:22.005640643Z   github.com/cilium/cilium/pkg/bpf.(*Map).NextKey()
2025-01-30T21:27:22.005647146Z       /go/src/github.com/cilium/cilium/pkg/bpf/map_linux.go:606 +0x11e
2025-01-30T21:27:22.005712027Z   github.com/cilium/cilium/pkg/bpf.(*Map).DumpReliablyWithCallback()
2025-01-30T21:27:22.005718629Z       /go/src/github.com/cilium/cilium/pkg/bpf/map_linux.go:768 +0x71b
2025-01-30T21:27:22.005815760Z   github.com/cilium/cilium/pkg/bpf.(*Map).GetModel()
2025-01-30T21:27:22.005822322Z       /go/src/github.com/cilium/cilium/pkg/bpf/map_linux.go:1413 +0x511
2025-01-30T21:27:22.005880481Z   github.com/cilium/cilium/pkg/bpf.GetOpenMaps()
2025-01-30T21:27:22.005886152Z       /go/src/github.com/cilium/cilium/pkg/bpf/map_register_linux.go:64 +0x2ad
2025-01-30T21:27:22.006024930Z   github.com/cilium/cilium/pkg/maps.(*getMapHandler).Handle()
2025-01-30T21:27:22.006034067Z       /go/src/github.com/cilium/cilium/pkg/maps/api.go:146 +0x24
2025-01-30T21:27:22.006242771Z   github.com/cilium/cilium/api/v1/server/restapi/daemon.(*GetMap).ServeHTTP()
2025-01-30T21:27:22.006254163Z       /go/src/github.com/cilium/cilium/api/v1/server/restapi/daemon/get_map.go:56 +0x289
2025-01-30T21:27:22.006259241Z   github.com/go-openapi/runtime/middleware.(*Context).RoutesHandler.NewOperationExecutor.func1()
2025-01-30T21:27:22.006263830Z       /go/src/github.com/cilium/cilium/vendor/github.com/go-openapi/runtime/middleware/operation.go:28 +0x8d
2025-01-30T21:27:22.006268328Z   net/http.HandlerFunc.ServeHTTP()
2025-01-30T21:27:22.006272597Z       /usr/local/go/src/net/http/server.go:2220 +0x47
2025-01-30T21:27:22.006277075Z   github.com/go-openapi/runtime/middleware.NewRouter.func1()
2025-01-30T21:27:22.006281593Z       /go/src/github.com/cilium/cilium/vendor/github.com/go-openapi/runtime/middleware/router.go:80 +0x316
2025-01-30T21:27:22.006285891Z   net/http.HandlerFunc.ServeHTTP()
2025-01-30T21:27:22.006290169Z       /usr/local/go/src/net/http/server.go:2220 +0x47
2025-01-30T21:27:22.006302292Z   github.com/go-openapi/runtime/middleware.Redoc.serveUI.func1()
2025-01-30T21:27:22.006306871Z       /go/src/github.com/cilium/cilium/vendor/github.com/go-openapi/runtime/middleware/ui_options.go:164 +0x119
2025-01-30T21:27:22.006311239Z   net/http.HandlerFunc.ServeHTTP()
2025-01-30T21:27:22.006315867Z       /usr/local/go/src/net/http/server.go:2220 +0x47
2025-01-30T21:27:22.006320435Z   github.com/go-openapi/runtime/middleware.Spec.func1()
2025-01-30T21:27:22.006325495Z       /go/src/github.com/cilium/cilium/vendor/github.com/go-openapi/runtime/middleware/spec.go:72 +0x230
2025-01-30T21:27:22.006330965Z   net/http.HandlerFunc.ServeHTTP()
2025-01-30T21:27:22.006336205Z       /usr/local/go/src/net/http/server.go:2220 +0x47
2025-01-30T21:27:22.006341785Z   github.com/cilium/cilium/pkg/metrics.(*APIEventTSHelper).ServeHTTP()
2025-01-30T21:27:22.006346464Z       /go/src/github.com/cilium/cilium/pkg/metrics/middleware.go:63 +0x3c4
2025-01-30T21:27:22.006351363Z   github.com/cilium/cilium/pkg/api.(*APIPanicHandler).ServeHTTP()
2025-01-30T21:27:22.006357275Z       /go/src/github.com/cilium/cilium/pkg/api/apipanic.go:49 +0xf4
2025-01-30T21:27:22.006362744Z   net/http.(*ServeMux).ServeHTTP()
2025-01-30T21:27:22.006368355Z       /usr/local/go/src/net/http/server.go:2747 +0x255
2025-01-30T21:27:22.006373885Z   net/http.serverHandler.ServeHTTP()
2025-01-30T21:27:22.006379416Z       /usr/local/go/src/net/http/server.go:3210 +0x2a1
2025-01-30T21:27:22.006385236Z   net/http.(*conn).serve()
2025-01-30T21:27:22.006391047Z       /usr/local/go/src/net/http/server.go:2092 +0x12a4
2025-01-30T21:27:22.006395776Z   net/http.(*Server).Serve.gowrap3()
2025-01-30T21:27:22.006400966Z       /usr/local/go/src/net/http/server.go:3360 +0x4f
2025-01-30T21:27:22.006406075Z 
2025-01-30T21:27:22.006410985Z Previous write at 0x00c0099535a8 by goroutine 4182:
2025-01-30T21:27:22.006418068Z   github.com/cilium/ebpf/internal/sys.(*FD).Disown()
2025-01-30T21:27:22.006423969Z       /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/internal/sys/fd.go:93 +0x8e
2025-01-30T21:27:22.006429549Z   github.com/cilium/ebpf/internal/sys.(*FD).Close()
2025-01-30T21:27:22.006435180Z       /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/internal/sys/fd.go:84 +0x47
2025-01-30T21:27:22.006870411Z   github.com/cilium/ebpf.(*Map).Close()
2025-01-30T21:27:22.006879989Z       /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/map.go:1371 +0x1e6
2025-01-30T21:27:22.006884858Z   github.com/cilium/cilium/pkg/bpf.(*Map).Close()
2025-01-30T21:27:22.006890338Z       /go/src/github.com/cilium/cilium/pkg/bpf/map_linux.go:591 +0x1ac
2025-01-30T21:27:22.006895778Z   github.com/cilium/cilium/pkg/maps/ctmap.CalculateCTMapPressure.func1.deferwrap1()
2025-01-30T21:27:22.006901959Z       /go/src/github.com/cilium/cilium/pkg/maps/ctmap/ctmap.go:858 +0x33
2025-01-30T21:27:22.006907119Z   runtime.deferreturn()
2025-01-30T21:27:22.006957073Z       /usr/local/go/src/runtime/panic.go:605 +0x5d
2025-01-30T21:27:22.006962683Z   github.com/cilium/cilium/pkg/controller.(*controller).runController()
2025-01-30T21:27:22.007064754Z       /go/src/github.com/cilium/cilium/pkg/controller/controller.go:251 +0xa2
2025-01-30T21:27:22.007072437Z   github.com/cilium/cilium/pkg/controller.(*Manager).createControllerLocked.gowrap1()
2025-01-30T21:27:22.007077677Z       /go/src/github.com/cilium/cilium/pkg/controller/manager.go:111 +0xbc
2025-01-30T21:27:22.007081986Z 
2025-01-30T21:27:22.007089579Z Goroutine 6096710 (running) created at:
2025-01-30T21:27:22.007095280Z   net/http.(*Server).Serve()
2025-01-30T21:27:22.007100720Z       /usr/local/go/src/net/http/server.go:3360 +0x8ec
2025-01-30T21:27:22.007114326Z   github.com/cilium/cilium/api/v1/server.(*Server).Start.func1()
2025-01-30T21:27:22.007119435Z       /go/src/github.com/cilium/cilium/api/v1/server/server.go:460 +0xd1
2025-01-30T21:27:22.007124324Z   github.com/cilium/cilium/api/v1/server.(*Server).Start.gowrap1()
2025-01-30T21:27:22.007129814Z       /go/src/github.com/cilium/cilium/api/v1/server/server.go:464 +0x4f
2025-01-30T21:27:22.007134142Z 
2025-01-30T21:27:22.007138921Z Goroutine 4182 (running) created at:
2025-01-30T21:27:22.007144712Z   github.com/cilium/cilium/pkg/controller.(*Manager).createControllerLocked()
2025-01-30T21:27:22.007150583Z       /go/src/github.com/cilium/cilium/pkg/controller/manager.go:111 +0x777
2025-01-30T21:27:22.007156023Z   github.com/cilium/cilium/pkg/controller.(*Manager).updateController()
2025-01-30T21:27:22.007161153Z       /go/src/github.com/cilium/cilium/pkg/controller/manager.go:84 +0x50f
2025-01-30T21:27:22.007166964Z   github.com/cilium/cilium/pkg/controller.(*Manager).UpdateController()
2025-01-30T21:27:22.007175199Z       /go/src/github.com/cilium/cilium/pkg/controller/manager.go:52 +0x257
2025-01-30T21:27:22.007180209Z   github.com/cilium/cilium/pkg/maps/ctmap.CalculateCTMapPressure()
2025-01-30T21:27:22.007185198Z       /go/src/github.com/cilium/cilium/pkg/maps/ctmap/ctmap.go:840 +0x75
2025-01-30T21:27:22.007222557Z   github.com/cilium/cilium/pkg/maps/ctmap/gc.(*GC).Enable.func3()
2025-01-30T21:27:22.007230763Z       /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:257 +0x12a

Metadata

Metadata

Assignees

Labels

area/agentCilium agent related.area/datapathImpacts bpf/ or low-level forwarding details, including map management and monitor messages.feature/conntrackneeds/triageThis issue requires triaging to establish severity and next steps.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions