web: avoid duplicate in-memory checks #9103
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes proposed by this PR
closes #8638
Previously we could spawn as many in-memory checks and they wouldn't be
checked for duplication until the atc/db/lock acquired a Mutex and
tried to get a lock from the database.
I'm guessing, based on the info from #8638 that the contention for this lock
would get really high. I'm also guessing that clusters seeing this issue are
high usage clusters. How high, idk. I'd guess they have a high number of
resource checks always occurring. It's possible that lidar may send off
multiple checks for the same resource which would result in multiple go
routines running and fighting over the same
atc/db/lock
. Lidar would sendoff multiple in-memory checks if every time it runs it still finds that a
resource has exceeded it's check interval.
I'm not 100% sure this is the problem and I don't have an easy way to
test this. I do not have reproducible steps for the issue the users say.
This is all based on my reading of the code as it is today
and the pprof memory graph reported from users that showed a lot of
in-memory checks waiting on the db/lock Mutex.
Release Note