-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Cilium's internal business logic relies on a highly parallel combination of reactive handlers for incoming information, "triggers" that ratelimit requests for processing to ensure Cilium does not over-consume resources, and "controllers" that periodically perform updates or resiliency checks of configured state. While in general most things are "eventually consistent", the presence of time-based triggers and controllers can introduce challenges when evaluating how Cilium will perform once the "eventual consistency" is resolved.
Issues have been introduced into the tree where timers do not trigger during testing, and the eventual consistency of the agent is in a state that causes connectivity disruption for users (such as #27210, fix #27327). It is quite difficult to systematically identify time-based errors across the entire agent by relying purely on such testing in each package. The goal is to provide a more systematic safety net for timer-based issues.
Tasks
- Add time wrapper to test agent delays in CI #27253
- Add a debug log to the time wrapper package to let developers know when times are truncated
- Extend to cover more than just cilium-agent
- Extend to cover more workflows, not just ginkgo (such as https://github.com/cilium/cilium/blob/main/.github/actions/helm-default/action.yaml)
- Re-evaluate the list of excluded packages to reduce them
- Experiment with timers shorter than 5s
- Add support for context.WithTimeout
- Apply time wrapper to controllers
- Evaluate use of
stdtime.After
in the tree and whether those cases could/should be replaced by pkg/time