-
Notifications
You must be signed in to change notification settings - Fork 3.4k
cilium: Small health cleanup improvements #33700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good. Have you tested if this fixes CI?
1290aa7
to
02b4ec9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good to me. We should also test if this really fixes CI like the previous PR did: dd94c27
/test |
Interestingly, the cleanup in the beginning still did not help:
It looks like sth else must be removing the lxc_health devices in between.. The sysdump itself generally has the lxc_health:
|
logs-cilium-df9gx-cilium-agent-20240711-132748.log This one looks interesting as well which could have caused an unexpected removal in between:
Also this patch is moot fc8074d Imho, the problem is rather some unexpected cleanup in between where suddenly the lxc_health device is gone. I'd think this may be an interesting one.. esp the latter condition ...
... which could create a race with the daemon when endpoints are reloaded. One other interesting log find:
|
2fef02f
to
44a8866
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I see the upgrades test failed with the same original error this is trying to fix, but after downgrading. This means it's actually not the version with this patch/fix that's failing, it's the previous release. Presumably then we can ignore that specific CI result: https://github.com/cilium/cilium/actions/runs/10065195269/job/27826569570 . |
Yeah, that's correct, @tommyp1ckles and I suspect this will be fixed once backported to 1.16. |
OK cool. I'll prepare backport for this + one other outstanding 1.16 blocker shortly. |
The PR #33700 got merged into v1.16 by now. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The PR #33700 got merged into v1.16 by now. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The PR cilium#33700 got merged into v1.16 by now. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
(see commit msg)
Fixes: #32689