Skip to content

Conversation

taylorsilva
Copy link
Member

Changes proposed by this PR

closes #8638

Previously we could spawn as many in-memory checks and they wouldn't be
checked for duplication until the atc/db/lock acquired a Mutex and
tried to get a lock from the database.

I'm guessing, based on the info from #8638 that the contention for this lock
would get really high. I'm also guessing that clusters seeing this issue are
high usage clusters. How high, idk. I'd guess they have a high number of
resource checks always occurring. It's possible that lidar may send off
multiple checks for the same resource which would result in multiple go
routines running and fighting over the same atc/db/lock. Lidar would send
off multiple in-memory checks if every time it runs it still finds that a
resource has exceeded it's check interval.

I'm not 100% sure this is the problem and I don't have an easy way to
test this. I do not have reproducible steps for the issue the users say.
This is all based on my reading of the code as it is today
and the pprof memory graph reported from users that showed a lot of
in-memory checks waiting on the db/lock Mutex.

Release Note

  • Avoid creating duplicate in-memory checks

Previously we could spawn as many in-memory checks and they wouldn't be
checked for duplication until the atc/db/lock acquired a Mutex and
tried to get a lock from the database.

I'm _guessing_, based on the info from
#8638 that the contention
for this lock would get really high. I'm guessing this and other users
cluster are very high usage clusters. It's possible that lidar may send
off multiple checks for the same resource which would result in multiple
go routines running and fighting over the same lock. Lidar would send
off multiple in-memory checks if every time it runs it still finds that
a resource has exceeded it's check interval.

I'm not 100% sure this is the problem and I don't have an easy way to
test this. This is all based on my reading of the code as it is today
and the pprof memory graph reported from users that showed a lot of
in-memory checks waiting on the db/lock Mutex.

Signed-off-by: Taylor Silva <dev@taydev.net>
this code isn't being used anywhere. Dead code.

Signed-off-by: Taylor Silva <dev@taydev.net>
@taylorsilva taylorsilva added the bug label Mar 8, 2025
@taylorsilva taylorsilva requested a review from a team as a code owner March 8, 2025 17:14
@taylorsilva taylorsilva merged commit 8c0fcad into master Mar 9, 2025
12 checks passed
@taylorsilva taylorsilva deleted the issue/8638 branch March 9, 2025 00:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Potential memory/goroutines leaks
1 participant