[vendor-change] Add back an Eventually test helper, use it across a few tests that have been flaky #4231

ZackButcher · 2018-03-13T01:13:16Z

Introduce Eventually helper that polls an event in tests until success. The Istio codebase has had a few iterations of this type of helper; we choose not to use gomega's Eventually since it requires using gomega's Matchers too (so it's not a drop in replacement).

This also fixes the flaky TestCheckReportDisable test (by retrying the condition rather than only checking it a single time). It also reduces the runtime of that test by ~half.

istio-merge-robot · 2018-03-13T01:14:22Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
We suggest the following additional approver: rshriram

Assign the PR to them by writing /assign @rshriram in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

ZackButcher · 2018-03-13T01:15:43Z

pilot/test/util/kubernetes.go

-// Eventually retries until f() returns true, or it times out in error
-func Eventually(name string, f func() bool, t *testing.T) {
-	t.Helper()
-	interval := 64 * time.Millisecond


Note that by default my new Eventually impl fails after 5 seconds, and this function will wait up to 1 minute. I'm curious to see what the tests do with this PR.

How about something in the middle - like 25 seconds ? I'm trying to switch default refresh to 10 seconds (due to load), this will give 2 refresh intervals.

for tests we shouldn't need to wait 10s for changes to propagate - we don't have that many services that 1s shouldn't work - so we can get through tests faster

Made the default 1s poll, 30 second deadline. Probably too generous, but since the goal is to fix flakes, maybe not the worst thing ever.

kyessenov · 2018-03-13T01:22:37Z

I think we should be careful with relying on exact timing. Tests that run in parallel can be suspended indefinitely. Docker time may not be linear, etc.

…

On Mon, Mar 12, 2018 at 6:15 PM Zack Butcher ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In pilot/test/util/kubernetes.go <#4231 (comment)>: > @@ -210,18 +209,3 @@ func FetchLogs(cl kubernetes.Interface, name, namespace string, container string } return string(raw) } - -// Eventually retries until f() returns true, or it times out in error -func Eventually(name string, f func() bool, t *testing.T) { - t.Helper() - interval := 64 * time.Millisecond Note that by default my new Eventually impl fails after 5 seconds, and this function will wait up to 1 minute. I'm curious to see what the tests do with this PR. — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#4231 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJGIxs60Z_38lHJEayf2PE62yVX5rFGFks5tdx3HgaJpZM4Sn7mn> .

xiaolanz · 2018-03-13T04:00:42Z

pkg/test/eventually.go

+}
+
+// Eventual constructs an EventualOpts instance with the provided polling interval and deadline.
+func Eventual(interval, deadline time.Duration) EventualOpts {


rename to NewEventualOps

xiaolanz · 2018-03-13T04:54:21Z

pkg/test/eventually.go

+	"time"
+)
+
+// A Condition is a function that returns true when a test condition is satisfied.


I understand eventually is useful in certain circumstances but I want to add a wary for strictly limiting its use, of no alternative better solution is available.

we should encourage implementing more efficient and customized waiting cycling using channels instead of this generalized backoff logic.

Also factor in that go discourage common libraries.

No, the whole goal is that we do not encourage customized waiting using channels. That means we wind up with duplicate code that is not necessarily trivial to get correct all over the place. This is exactly the kind of logic that should be abstracted away into a function.

xiaolanz · 2018-03-13T05:01:05Z

pilot/test/mock/config.go

-	defer close(ach)
-	defer close(dch)
-	sad, sdd := 0, 0
+	added, deleted := atomic.NewInt64(0), atomic.NewInt64(0)


Don't make this change. go discourage shared memory or cond pattern:
"Do not communicate by sharing memory; instead, share memory by communicating."
https://blog.golang.org/share-memory-by-communicating

Yes, that is generally true in go code, and I wouldn't allow this kind shared memory into code paths that execute in production. However, this is a test, and clarity of code under test far outweighs that "rule". What's going on in the test is far more obvious using these atomics compared to using channels. Look at the final Eventually, it's very explicit: eventually these two values will both be n, or the test fails. Compare that to the loop with channels/switch, where that very clear predicate is muddled by handling timeout and writes on the channel. '

Further, your code busy-waits: there's no delay or sleeping, it just compares added and deleted to n as fast as it can in a loop (or, occasionally, gets notified on a change on the other channels). Given our tests execute in parallel, that means you're occupying a core doing nothing while we could be executing another test in parallel.

Finally, which do you think is more maintainable in the long run? Less code that's more explicit about its meaning always wins in my book.

for exactly the same reason go doesn't provide a min/max library. I prefer this not to be unified under one util.

Golang does in fact provide max/min in a library: it's right in the standard library.

However, a function that can be written in four lines:

func min(a, b MyNumType) MyNumType { if a < b { return a } return b }

is a pretty far cry from a channel based setup. I'm pretty confident I can write min correctly every time, I'm not confident I can write eventual polling via channels correctly every time.

I'm fine with the change since it's in test. I changed these to channels to remove flakiness issues and race issues. Please make sure it passes flaky test (-count 1000) and -racetest before submission.

xiaolanz · 2018-03-13T05:02:44Z

mixer/test/client/env/setup.go

@@ -155,18 +166,16 @@ func (s *TestSetup) ReStartEnvoy() {

 // VerifyCheckCount verifies the number of Check calls.
 func (s *TestSetup) VerifyCheckCount(tag string, expected int) {
-	if s.mixer.check.Count() != expected {


why these needs to have an eventual waiting loop, can we notify the verification by blocking on channels?

Not without large changes across the mixer test suite.

xiaolanz · 2018-03-13T05:03:26Z

pilot/pkg/config/kube/ingress/controller_test.go


+	numNotifications := atomic.NewInt64(0)


please don't use atomic. see the reasons below.

This seems like a totally legitimate use of atomic ops. Channels would end up considerably more complex, obfuscated, and higher overhead.

mandarjog · 2018-03-13T14:44:33Z

Why not gomega.Eventually with channels in the eventual function? Or you can re evaluate the whole thing?

Gomega is already. Dependency of istio.

ZackButcher · 2018-03-14T18:34:14Z

@mandarjog gomega requires using matchers to use Eventually, so you have to adopt gomega for the entire test and not just the portion that needs polling (I did look at using gomega first, or implementing this function on top of gomega's Eventually). The goal is that this is an in-between to consolidate polling logic. If we move everything over to gomega then it should be easy to change tests that use this.

ZackButcher · 2018-03-14T22:05:33Z

/retest

codecov · 2018-03-14T22:15:36Z

Codecov Report

Merging #4231 into master will decrease coverage by 4%.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #4231    +/-   ##
=======================================
- Coverage      75%     72%    -3%     
=======================================
  Files         289     295     +6     
  Lines       24586   25406   +820     
=======================================
- Hits        18313   18140   -173     
- Misses       5505    6501   +996     
+ Partials      768     765     -3

Impacted Files	Coverage Δ
mixer/adapter/prometheus/prometheus.go	`89% <0%> (-11%)`	⬇️
mixer/adapter/servicecontrol/reportprocessor.go	`79% <0%> (-1%)`	⬇️
pilot/cmd/pilot-agent/main.go	`38% <0%> (ø)`	⬇️
mixer/adapter/denier/denier.go	`100% <0%> (ø)`	⬆️
mixer/cmd/mixc/cmd/check.go	`0% <0%> (ø)`	⬆️
mixer/adapter/stdio/stdio.go	`100% <0%> (ø)`	⬆️
mixer/adapter/stdio/zap.go	`100% <0%> (ø)`	⬆️
mixer/adapter/dogstatsd/dogstatsd.go	`100% <0%> (ø)`	⬆️
mixer/adapter/memquota/rollingWindow.go	`100% <0%> (ø)`	⬆️
mixer/adapter/list/ipList.go	`100% <0%> (ø)`	⬆️
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c13fa4b...9709ff5. Read the comment docs.

ZackButcher · 2018-03-14T22:27:01Z

/retest

ZackButcher · 2018-03-14T22:31:14Z

@xiaolanz PTAL. I do not plan to remove the atomics.

xiaolanz · 2018-03-15T05:49:51Z

pkg/test/eventually.go

+
+	// We didn't get a happy fast-path, so set up timers and wait.
+	timeout := time.NewTimer(e.deadline).C
+	poll := time.NewTicker(e.interval).C


The previous eventually uses exponential backoff, for what reason we don't do it here?

Mostly because this is way simpler (exponential backoff means we can't use a ticker, have to sleep ourselves). I could add back exponential backoff if we need it; I'm not convinced we do for these usecases. Only one of these usages is actually waiting on/querying an external system where exponential backoff would be valuable.

ZackButcher · 2018-03-15T18:59:16Z

/retest

geeknoid

I really like this simple library that makes tests both simpler to read and more robust.

We could pull in an incremental backoff library to make this even cleaner...

geeknoid · 2018-03-15T19:15:58Z

pilot/pkg/config/kube/ingress/controller_test.go


+	numNotifications := atomic.NewInt64(0)


This seems like a totally legitimate use of atomic ops. Channels would end up considerably more complex, obfuscated, and higher overhead.

geeknoid · 2018-03-15T19:17:16Z

pkg/test/eventually.go

+	defaultDeadline = 30 * time.Second
+)
+
+// DEFAULT is the default EventualOpts instance used for the static test.Eventually function.


Opts -> Ops

Looking at the code with "Ops" everywhere, I think I like "Opts" a little better (IMO ops = operations, opts = options). I just wasn't consistent in my own usage due to going back and forth on the name.

geeknoid · 2018-03-15T19:18:44Z

pkg/test/eventually.go

+}
+
+// Eventual constructs an EventualOpts instance with the provided polling interval and deadline.
+func NewEventualOps(interval, deadline time.Duration) EventualOpts {


Should this be NewEventualOpts? Or should the return value type be named EventualOps?

geeknoid · 2018-03-15T19:18:53Z

pkg/test/eventually.go

+	DEFAULT.Eventually(t, name, cond)
+}
+
+// EventualOpts is defines a polling strategy for operations that must eventually succeed.


is defines -> defines

geeknoid · 2018-03-15T19:21:13Z

pkg/test/eventually.go

+	}
+}
+
+// Eventually polls cond until it succeeds (returns true) or we exceed the deadline. cond itself can also fail the test


Have you look at any of the existing exponential backoff libraries out there as potential building blocks for this code? For example: https://godoc.org/github.com/cenkalti/backoff

geeknoid · 2018-03-15T19:25:31Z

pkg/test/eventually.go

+// using t.Fatal to exit early.
+//
+// Eventually does not perform backoff of its polling, but polls at a constant rate.
+func (e EventualOpts) Eventually(t *testing.T, name string, cond Condition) {


Perhaps mention what 'name' is for.
Perhaps mention the fact "cond" does not need to be thread-safe, it is only called from the current goroutine.

kyessenov · 2018-03-15T21:33:47Z

My opinion is that these tests should be removed from the unit test suite. If we have to observe eventual consistency, then it's not really a unit test.

istio-merge-robot · 2018-03-16T00:25:25Z

@ZackButcher PR needs rebase

ZackButcher

Updated to use exponential backoff, PTAL. I do agree with Kuat that these tests should be moved out of the unit test suite.

ZackButcher · 2018-03-15T19:38:35Z

pkg/test/eventually.go

+	defaultDeadline = 30 * time.Second
+)
+
+// DEFAULT is the default EventualOpts instance used for the static test.Eventually function.


Looking at the code with "Ops" everywhere, I think I like "Opts" a little better (IMO ops = operations, opts = options). I just wasn't consistent in my own usage due to going back and forth on the name.

ZackButcher · 2018-03-16T00:31:10Z

(Need to nail down updating the dependencies, too, so it'll be a bit for that.)

ldemailly · 2018-03-28T18:25:17Z

istio/old_vendor-istio_repo#38

ZackButcher · 2018-03-28T18:40:39Z

Vendor update in istio/old_vendor-istio_repo#42

ZackButcher · 2018-03-28T19:28:00Z

/retest

kyessenov · 2018-03-28T19:29:17Z

master is broken, you should have waited on merging vendor until this passes the tests after rebase.

ldemailly · 2018-03-28T21:23:29Z

semi urgent ping:

cd vendor; git checkout master; git pull; then cd ..; cp vendor/Gopkg.* . ; git add ; commit; update pr
(ie sync back your PR to what is actually in vendor's master)

ldemailly · 2018-03-29T01:44:19Z

/lgtm

istio-testing · 2018-03-29T01:44:22Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ldemailly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ldemailly]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ldemailly · 2018-03-29T04:23:17Z

@ZackButcher

pkg/test/eventually.go:69:3:error: should use a simple channel send/receive instead of select with a single case (S1000) (megacheck)
Makefile:317: recipe for target 'lint' failed

can we please get this in - I'd like to move forward with the vendor sync

kyessenov · 2018-03-29T04:49:29Z

Yes, please do something with semi-committed vendor, I'm stuck with my vendor change. I'm next in line...

…

On Wed, Mar 28, 2018 at 9:23 PM Laurent Demailly ***@***.***> wrote: @ZackButcher <https://github.com/ZackButcher> pkg/test/eventually.go:69:3:error: should use a simple channel send/receive instead of select with a single case (S1000) (megacheck) Makefile:317: recipe for target 'lint' failed can we please get this in - I'd like to move forward with the vendor sync — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#4231 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJGIxtH5iGJ829tQAXY_Zd8_lHcLaOVoks5tjGG9gaJpZM4Sn7mn> .

istio-testing · 2018-03-29T17:09:25Z

New changes are detected. LGTM label has been removed.

istio-testing · 2018-03-29T17:50:16Z

@ZackButcher: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
prow/istio-pilot-e2e-v1alpha3.sh	`9709ff5`	link	`/test istio-pilot-e2e-v1alpha3`
prow/istio-pilot-e2e.sh	`9709ff5`	link	`/test istio-pilot-e2e`

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

…ew tests that have been flaky (istio#4231) * Fix up eventually for latest repo changes * fix imports * update gopkg and vendor ref * typo * fix linter complaints

ZackButcher requested review from kyessenov, douglas-reid, xiaolanz and a team March 13, 2018 01:13

googlebot added the cla: yes label Mar 13, 2018

ZackButcher commented Mar 13, 2018

View reviewed changes

xiaolanz suggested changes Mar 13, 2018

View reviewed changes

ZackButcher force-pushed the eventually branch from 4af112d to 804768b Compare March 14, 2018 22:16

ZackButcher force-pushed the eventually branch from bae9c68 to a675ab3 Compare March 14, 2018 22:47

xiaolanz reviewed Mar 15, 2018

View reviewed changes

ZackButcher changed the title ~~Add back and Eventually test helper, use it across a few tests that have been flaky~~ Add back an Eventually test helper, use it across a few tests that have been flaky Mar 15, 2018

geeknoid reviewed Mar 15, 2018

View reviewed changes

xiaolanz approved these changes Mar 15, 2018

View reviewed changes

ZackButcher requested a review from a team March 16, 2018 00:24

istio-merge-robot added the needs-rebase Indicates a PR needs to be rebased before being merged label Mar 16, 2018

ZackButcher commented Mar 16, 2018

View reviewed changes

ZackButcher force-pushed the eventually branch from 9de18c2 to 38b6b72 Compare March 28, 2018 18:30

istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Mar 28, 2018

ZackButcher mentioned this pull request Mar 28, 2018

Add backoff dep, and missing protos istio/old_vendor-istio_repo#42

Merged

ldemailly mentioned this pull request Mar 28, 2018

Update dependencies #4601

Closed

ZackButcher force-pushed the eventually branch from 775eb46 to de98dcc Compare March 29, 2018 00:05

istio-testing assigned ldemailly Mar 29, 2018

istio-testing added the lgtm label Mar 29, 2018

istio-testing added the approved label Mar 29, 2018

ZackButcher added 5 commits March 29, 2018 10:05

Fix up eventually for latest repo changes

fec718c

fix imports

4f3375e

update gopkg and vendor ref

caf3da4

typo

4938afc

fix linter complaints

9709ff5

ZackButcher force-pushed the eventually branch from f0ec364 to 9709ff5 Compare March 29, 2018 17:09

istio-testing removed the lgtm label Mar 29, 2018

ldemailly merged commit 9146694 into istio:master Mar 29, 2018

ldemailly mentioned this pull request Mar 29, 2018

[vendor-change] CloudWatch Mixer adapter #4617

Merged

JimmyCYJ mentioned this pull request Mar 30, 2018

TestCheckReportDisable - check_report_disable failed in racy test #4653

Closed

[vendor-change] Add back an Eventually test helper, use it across a few tests that have been flaky #4231

[vendor-change] Add back an Eventually test helper, use it across a few tests that have been flaky #4231

Uh oh!

Conversation

ZackButcher commented Mar 13, 2018

Uh oh!

istio-merge-robot commented Mar 13, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kyessenov commented Mar 13, 2018 via email

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZackButcher Mar 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mandarjog commented Mar 13, 2018

Uh oh!

ZackButcher commented Mar 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZackButcher commented Mar 14, 2018

Uh oh!

codecov bot commented Mar 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ZackButcher commented Mar 14, 2018

Uh oh!

ZackButcher commented Mar 14, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZackButcher commented Mar 15, 2018

Uh oh!

geeknoid left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZackButcher Mar 15, 2018 •

edited

Loading

ZackButcher commented Mar 14, 2018 •

edited

Loading

codecov bot commented Mar 14, 2018 •

edited

Loading