operator/pkg/lbipam: Add lifecycle tests for LB-IPAM #38965
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We already have extensive tests for actual behavior. These tests are written such that we directly call into the event processing code. This is done so all code is executed synchronously which makes them very stable.
A flaw of calling directly into the event processing code is that we don't test the actual lifecycle of the LB-IPAM controller and bugs that might arise in the path between the API client and the event processing logic.
This commit adds a test to exercise the lifecycle of the LB-IPAM controller. To do so we create a hive with smallest setup which still works. LB-IPAM now increments a few atomic counters (only during testing) which we use to glean information about the lifecycle of the controller and if its processing events or not. We start the hive as we would in production. Since everything now runs in its own goroutine we do not have strong guarantees about when the controller will process events. To compensate, relatively large timeouts are used when waiting for something to happen, in an attempt to prevent flaky tests. However due to the nature of the tests, it might still happen, and we will have to keep an eye on it.
When stress testing it locally it seems stable:
This should hopefully catch regressions of #37965 or similar cases.