-
-
Notifications
You must be signed in to change notification settings - Fork 867
Description
Summary
We upgraded from 6.7..6 to 7.8.3.
Our concourse web have a bunch of pipelines, teams and dedicated team workers.
Since the upgrade we identified memory growing until it got killed by OOM
And in prometheus metrics we can see goroutines number increase too.
And something interesting
Steps to reproduce
Not sure
Expected results
Actual results
Additional context
This Concourse server run only one web component. This instance is running since Concourse version 4 and have quite lot of team and pipelines.
On this Concourse several workers are attached to specific teams (there is no global/default workers).
The database is around 8Go, we have quite lot of resource versions since this cluster is quite old.
The database CPU usage is quite high (3 vCPUs, 4 GB RAM)
Slow db logs
2022-11-24 01:34:55 UTC [6129]: user=concourse,db=concourse,app=[unknown],client=51.159.74.196 LOG: 00000: process 6129 acquired ExclusiveLock on tuple (40,125) of relation 1733315 of database 1730368 after 1028.441 ms
2022-11-24 01:34:55 UTC [6129]: user=concourse,db=concourse,app=[unknown],client=51.159.74.196 LOCATION: ProcSleep, proc.c:1495
2022-11-24 01:34:55 UTC [6129]: user=concourse,db=concourse,app=[unknown],client=51.159.74.196 STATEMENT:
UPDATE resource_config_scopes
SET last_check_start_time = now(), last_check_build_id = $1, last_check_build_plan = $2
WHERE id = $3
2022-11-24 01:34:55 UTC [6150]: user=concourse,db=concourse,app=[unknown],client=51.159.74.196 LOG: 00000: process 6150 acquired ShareLock on transaction 2106307433 after 1119.776 ms
2022-11-24 01:34:55 UTC [6150]: user=concourse,db=concourse,app=[unknown],client=51.159.74.196 CONTEXT: while updating tuple (241,88) in relation "resource_config_scopes"
2022-11-24 01:34:55 UTC [6150]: user=concourse,db=concourse,app=[unknown],client=51.159.74.196 LOCATION: ProcSleep, proc.c:1495
2022-11-24 01:34:55 UTC [6150]: user=concourse,db=concourse,app=[unknown],client=51.159.74.196 STATEMENT:
UPDATE resource_config_scopes
SET last_check_start_time = now(), last_check_build_id = $1, last_check_build_plan = $2
WHERE id = $3
traces
Some traces I managed to get with curl http://localhost:$CONCOURSE_DEBUG_BIND_PORT/debug/pprof/goroutine?debug=1
goroutine profile: total 353596
349824 @ 0x43ca56 0x44d99e 0x44d975 0x46a585 0x478945 0x941e2e 0x941e0e 0x941cf7 0x941b26 0xc4e96d 0xe966ef 0xe8a89e 0x46e781
# 0x46a584 sync.runtime_SemacquireMutex+0x24 runtime/sema.go:77
# 0x478944 sync.(*Mutex).lockSlow+0x164 sync/mutex.go:171
# 0x941e2d sync.(*Mutex).Lock+0x6d sync/mutex.go:90
# 0x941e0d github.com/concourse/concourse/atc/db/lock.(*lock).Acquire+0x4d github.com/concourse/concourse/atc/db/lock/lock.go:217
# 0x941cf6 github.com/concourse/concourse/atc/db/lock.(*lockFactory).Acquire+0x156 github.com/concourse/concourse/atc/db/lock/lock.go:175
# 0x941b25 github.com/concourse/concourse/atc/db/lock.lockFactories.Acquire+0xa5 github.com/concourse/concourse/atc/db/lock/lock.go:161
# 0xc4e96c github.com/concourse/concourse/atc/db.(*inMemoryCheckBuild).AcquireTrackingLock+0xec github.com/concourse/concourse/atc/db/build_in_memory_check.go:406
# 0xe966ee github.com/concourse/concourse/atc/engine.(*engineBuild).Run+0x16e github.com/concourse/concourse/atc/engine/engine.go:115
# 0xe8a89d github.com/concourse/concourse/atc/builds.(*Tracker).trackBuild.func1+0x31d github.com/concourse/concourse/atc/builds/tracker.go:109
820 @ 0x43ca56 0x44c8dc 0xe96de5 0xe8a89e 0x46e781
# 0xe96de4 github.com/concourse/concourse/atc/engine.(*engineBuild).Run+0x864 github.com/concourse/concourse/atc/engine/engine.go:218
# 0xe8a89d github.com/concourse/concourse/atc/builds.(*Tracker).trackBuild.func1+0x31d github.com/concourse/concourse/atc/builds/tracker.go:109
Workaround
We managed to find a workaround by increasing component-runner-interval from 10s to 30s using
CONCOURSE_COMPONENT_RUNNER_INTERVAL : 30s
The topic has been raised on Discord here https://discord.com/channels/219899946617274369/413770960089382922/1045614009317077022
Triaging info
- Concourse version: 7.8.3
- Browser (if applicable):
- Did this used to work?
Metadata
Metadata
Assignees
Labels
Type
Projects
Status