-
Notifications
You must be signed in to change notification settings - Fork 8.1k
[vendor-change] Add back an Eventually test helper, use it across a few tests that have been flaky #4231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Assign the PR to them by writing The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
// Eventually retries until f() returns true, or it times out in error | ||
func Eventually(name string, f func() bool, t *testing.T) { | ||
t.Helper() | ||
interval := 64 * time.Millisecond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that by default my new Eventually impl fails after 5 seconds, and this function will wait up to 1 minute. I'm curious to see what the tests do with this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about something in the middle - like 25 seconds ? I'm trying to switch default refresh to 10 seconds (due to load), this will give 2 refresh intervals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for tests we shouldn't need to wait 10s for changes to propagate - we don't have that many services that 1s shouldn't work - so we can get through tests faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the default 1s poll, 30 second deadline. Probably too generous, but since the goal is to fix flakes, maybe not the worst thing ever.
I think we should be careful with relying on exact timing. Tests that run
in parallel can be suspended indefinitely. Docker time may not be linear,
etc.
…On Mon, Mar 12, 2018 at 6:15 PM Zack Butcher ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pilot/test/util/kubernetes.go
<#4231 (comment)>:
> @@ -210,18 +209,3 @@ func FetchLogs(cl kubernetes.Interface, name, namespace string, container string
}
return string(raw)
}
-
-// Eventually retries until f() returns true, or it times out in error
-func Eventually(name string, f func() bool, t *testing.T) {
- t.Helper()
- interval := 64 * time.Millisecond
Note that by default my new Eventually impl fails after 5 seconds, and
this function will wait up to 1 minute. I'm curious to see what the tests
do with this PR.
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#4231 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AJGIxs60Z_38lHJEayf2PE62yVX5rFGFks5tdx3HgaJpZM4Sn7mn>
.
|
pkg/test/eventually.go
Outdated
} | ||
|
||
// Eventual constructs an EventualOpts instance with the provided polling interval and deadline. | ||
func Eventual(interval, deadline time.Duration) EventualOpts { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to NewEventualOps
pkg/test/eventually.go
Outdated
"time" | ||
) | ||
|
||
// A Condition is a function that returns true when a test condition is satisfied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand eventually is useful in certain circumstances but I want to add a wary for strictly limiting its use, of no alternative better solution is available.
we should encourage implementing more efficient and customized waiting cycling using channels instead of this generalized backoff logic.
Also factor in that go discourage common libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the whole goal is that we do not encourage customized waiting using channels. That means we wind up with duplicate code that is not necessarily trivial to get correct all over the place. This is exactly the kind of logic that should be abstracted away into a function.
defer close(ach) | ||
defer close(dch) | ||
sad, sdd := 0, 0 | ||
added, deleted := atomic.NewInt64(0), atomic.NewInt64(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't make this change. go discourage shared memory or cond pattern:
"Do not communicate by sharing memory; instead, share memory by communicating."
https://blog.golang.org/share-memory-by-communicating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is generally true in go code, and I wouldn't allow this kind shared memory into code paths that execute in production. However, this is a test, and clarity of code under test far outweighs that "rule". What's going on in the test is far more obvious using these atomics compared to using channels. Look at the final Eventually
, it's very explicit: eventually these two values will both be n
, or the test fails. Compare that to the loop with channels/switch, where that very clear predicate is muddled by handling timeout and writes on the channel. '
Further, your code busy-waits: there's no delay or sleeping, it just compares added and deleted to n as fast as it can in a loop (or, occasionally, gets notified on a change on the other channels). Given our tests execute in parallel, that means you're occupying a core doing nothing while we could be executing another test in parallel.
Finally, which do you think is more maintainable in the long run? Less code that's more explicit about its meaning always wins in my book.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for exactly the same reason go doesn't provide a min/max library. I prefer this not to be unified under one util.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Golang does in fact provide max/min in a library: it's right in the standard library.
However, a function that can be written in four lines:
func min(a, b MyNumType) MyNumType {
if a < b {
return a
}
return b
}
is a pretty far cry from a channel based setup. I'm pretty confident I can write min correctly every time, I'm not confident I can write eventual polling via channels correctly every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the change since it's in test. I changed these to channels to remove flakiness issues and race issues. Please make sure it passes flaky test (-count 1000) and -racetest before submission.
@@ -155,18 +166,16 @@ func (s *TestSetup) ReStartEnvoy() { | |||
|
|||
// VerifyCheckCount verifies the number of Check calls. | |||
func (s *TestSetup) VerifyCheckCount(tag string, expected int) { | |||
if s.mixer.check.Count() != expected { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why these needs to have an eventual waiting loop, can we notify the verification by blocking on channels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not without large changes across the mixer test suite.
|
||
numNotifications := atomic.NewInt64(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't use atomic. see the reasons below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a totally legitimate use of atomic ops. Channels would end up considerably more complex, obfuscated, and higher overhead.
Why not gomega.Eventually with channels in the eventual function? Or you can re evaluate the whole thing? Gomega is already. Dependency of istio. |
@mandarjog gomega requires using matchers to use Eventually, so you have to adopt gomega for the entire test and not just the portion that needs polling (I did look at using gomega first, or implementing this function on top of gomega's Eventually). The goal is that this is an in-between to consolidate polling logic. If we move everything over to gomega then it should be easy to change tests that use this. |
/retest |
Codecov Report
@@ Coverage Diff @@
## master #4231 +/- ##
=======================================
- Coverage 75% 72% -3%
=======================================
Files 289 295 +6
Lines 24586 25406 +820
=======================================
- Hits 18313 18140 -173
- Misses 5505 6501 +996
+ Partials 768 765 -3
Continue to review full report at Codecov.
|
/retest |
@xiaolanz PTAL. I do not plan to remove the atomics. |
pkg/test/eventually.go
Outdated
|
||
// We didn't get a happy fast-path, so set up timers and wait. | ||
timeout := time.NewTimer(e.deadline).C | ||
poll := time.NewTicker(e.interval).C |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous eventually uses exponential backoff, for what reason we don't do it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly because this is way simpler (exponential backoff means we can't use a ticker, have to sleep ourselves). I could add back exponential backoff if we need it; I'm not convinced we do for these usecases. Only one of these usages is actually waiting on/querying an external system where exponential backoff would be valuable.
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this simple library that makes tests both simpler to read and more robust.
We could pull in an incremental backoff library to make this even cleaner...
|
||
numNotifications := atomic.NewInt64(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a totally legitimate use of atomic ops. Channels would end up considerably more complex, obfuscated, and higher overhead.
pkg/test/eventually.go
Outdated
defaultDeadline = 30 * time.Second | ||
) | ||
|
||
// DEFAULT is the default EventualOpts instance used for the static test.Eventually function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opts -> Ops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code with "Ops" everywhere, I think I like "Opts" a little better (IMO ops = operations, opts = options). I just wasn't consistent in my own usage due to going back and forth on the name.
pkg/test/eventually.go
Outdated
} | ||
|
||
// Eventual constructs an EventualOpts instance with the provided polling interval and deadline. | ||
func NewEventualOps(interval, deadline time.Duration) EventualOpts { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be NewEventualOpts? Or should the return value type be named EventualOps?
pkg/test/eventually.go
Outdated
DEFAULT.Eventually(t, name, cond) | ||
} | ||
|
||
// EventualOpts is defines a polling strategy for operations that must eventually succeed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is defines -> defines
pkg/test/eventually.go
Outdated
} | ||
} | ||
|
||
// Eventually polls cond until it succeeds (returns true) or we exceed the deadline. cond itself can also fail the test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you look at any of the existing exponential backoff libraries out there as potential building blocks for this code? For example: https://godoc.org/github.com/cenkalti/backoff
pkg/test/eventually.go
Outdated
// using t.Fatal to exit early. | ||
// | ||
// Eventually does not perform backoff of its polling, but polls at a constant rate. | ||
func (e EventualOpts) Eventually(t *testing.T, name string, cond Condition) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps mention what 'name' is for.
Perhaps mention the fact "cond" does not need to be thread-safe, it is only called from the current goroutine.
My opinion is that these tests should be removed from the unit test suite. If we have to observe eventual consistency, then it's not really a unit test. |
@ZackButcher PR needs rebase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use exponential backoff, PTAL. I do agree with Kuat that these tests should be moved out of the unit test suite.
pkg/test/eventually.go
Outdated
defaultDeadline = 30 * time.Second | ||
) | ||
|
||
// DEFAULT is the default EventualOpts instance used for the static test.Eventually function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code with "Ops" everywhere, I think I like "Opts" a little better (IMO ops = operations, opts = options). I just wasn't consistent in my own usage due to going back and forth on the name.
(Need to nail down updating the dependencies, too, so it'll be a bit for that.) |
Vendor update in istio/old_vendor-istio_repo#42 |
/retest |
master is broken, you should have waited on merging vendor until this passes the tests after rebase. |
semi urgent ping: cd vendor; git checkout master; git pull; then cd ..; cp vendor/Gopkg.* . ; git add ; commit; update pr |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ldemailly The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
pkg/test/eventually.go:69:3:error: should use a simple channel send/receive instead of select with a single case (S1000) (megacheck) can we please get this in - I'd like to move forward with the vendor sync |
Yes, please do something with semi-committed vendor, I'm stuck with my
vendor change. I'm next in line...
…On Wed, Mar 28, 2018 at 9:23 PM Laurent Demailly ***@***.***> wrote:
@ZackButcher <https://github.com/ZackButcher>
pkg/test/eventually.go:69:3:error: should use a simple channel
send/receive instead of select with a single case (S1000) (megacheck)
Makefile:317: recipe for target 'lint' failed
can we please get this in - I'd like to move forward with the vendor sync
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#4231 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AJGIxtH5iGJ829tQAXY_Zd8_lHcLaOVoks5tjGG9gaJpZM4Sn7mn>
.
|
New changes are detected. LGTM label has been removed. |
@ZackButcher: The following tests failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
…ew tests that have been flaky (istio#4231) * Fix up eventually for latest repo changes * fix imports * update gopkg and vendor ref * typo * fix linter complaints
Introduce Eventually helper that polls an event in tests until success. The Istio codebase has had a few iterations of this type of helper; we choose not to use gomega's Eventually since it requires using gomega's Matchers too (so it's not a drop in replacement).
This also fixes the flaky
TestCheckReportDisable
test (by retrying the condition rather than only checking it a single time). It also reduces the runtime of that test by ~half.