-
Notifications
You must be signed in to change notification settings - Fork 3.7k
fix container discarding event #9652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @ningmingxiao. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
9affa74
to
d1e0739
Compare
433e0b3
to
0c4810d
Compare
Signed-off-by: ningmingxiao <ning.mingxiao@zte.com.cn>
@ningmingxiao can you add issue link? thanks. |
|
containerEventsDroppedCount.Inc() | ||
log.G(ctx).Debugf("containerEventsChan is full, discarding event %+v", event) | ||
} | ||
eq.enqueue(event) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the queue add over just
go func(e runtime.ContainerEventResponse) {
c.containerEventsChan <- e
}(event)
Neither approach seems to guarantee the order if that is the intent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If order preservation is important, could also just have a single dequeue go routine which runs when there are items in the queue, like
func (eq *eventQueue) enqueue(c chan<- runtime.ContainerEventResponse, event runtime.ContainerEventResponse) {
eq.lock.Lock()
defer eq.lock.Unlock()
eq.items = append(eq.items, event)
if len(eq.items) == 1 {
go func() {
var event *runtime.ContainerEventResponse
for {
eq.lock.Lock()
if event != nil {
eq.items = eq.items[1:]
}
if len(eq.items) == 0 {
eq.lock.Unlock()
return
}
event = &eq.items[0]
eq.lock.Unlock()
c <- *event
}
}()
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If just generate one event
for {
eq.lock.Lock()
// here len(eq.items)=1 ,event is not nill
if event != nil {
//after next stup len(eq.items) = 0
eq.items = eq.items[1:]
}
//here will return len(eq.items)=0 will lost a event
if len(eq.items) == 0 {
eq.lock.Unlock()
return
}
event = &eq.items[0]
eq.lock.Unlock()
c <- *event
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about add a lock in criService struct
c.mu.lock()
eq.enqueue(event)
c.mu.unlock()
to keep event in order
@dmcgowan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be looked for a better solution overall rather than just add more locking/queuing. There are already TODOs around the publishing of the events and it is using a channel which a large buffer which is not a good pattern to avoid lost messages. It probably makes more sense to use a queue but to avoid leaking memory if there is no one calling GetContainerEvents
the queue should have an expiration if nothing is removing items from the queue. I don't think a quick fix to just add a queue is going to work as it may lead to a memory leak.
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This PR is stale because it has been open 90 days with no activity. This PR will be closed in 7 days unless new comments are made or the stale label is removed. |
This PR was closed because it has been stalled for 7 days with no activity. |
use crictl create 1000 containers and containerEventsChan will full, then don't call GetContainerEvents to consume events, events will lost. will show "containerEventsChan is full, discarding event ..."
issue:#8892