Skip to content

Conversation

ZackButcher
Copy link
Contributor

Introduce Eventually helper that polls an event in tests until success. The Istio codebase has had a few iterations of this type of helper; we choose not to use gomega's Eventually since it requires using gomega's Matchers too (so it's not a drop in replacement).

This also fixes the flaky TestCheckReportDisable test (by retrying the condition rather than only checking it a single time). It also reduces the runtime of that test by ~half.

@istio-merge-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
We suggest the following additional approver: rshriram

Assign the PR to them by writing /assign @rshriram in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

// Eventually retries until f() returns true, or it times out in error
func Eventually(name string, f func() bool, t *testing.T) {
t.Helper()
interval := 64 * time.Millisecond
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that by default my new Eventually impl fails after 5 seconds, and this function will wait up to 1 minute. I'm curious to see what the tests do with this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something in the middle - like 25 seconds ? I'm trying to switch default refresh to 10 seconds (due to load), this will give 2 refresh intervals.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for tests we shouldn't need to wait 10s for changes to propagate - we don't have that many services that 1s shouldn't work - so we can get through tests faster

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the default 1s poll, 30 second deadline. Probably too generous, but since the goal is to fix flakes, maybe not the worst thing ever.

@kyessenov
Copy link
Contributor

kyessenov commented Mar 13, 2018 via email

}

// Eventual constructs an EventualOpts instance with the provided polling interval and deadline.
func Eventual(interval, deadline time.Duration) EventualOpts {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to NewEventualOps

"time"
)

// A Condition is a function that returns true when a test condition is satisfied.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand eventually is useful in certain circumstances but I want to add a wary for strictly limiting its use, of no alternative better solution is available.

we should encourage implementing more efficient and customized waiting cycling using channels instead of this generalized backoff logic.

Also factor in that go discourage common libraries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the whole goal is that we do not encourage customized waiting using channels. That means we wind up with duplicate code that is not necessarily trivial to get correct all over the place. This is exactly the kind of logic that should be abstracted away into a function.

defer close(ach)
defer close(dch)
sad, sdd := 0, 0
added, deleted := atomic.NewInt64(0), atomic.NewInt64(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't make this change. go discourage shared memory or cond pattern:
"Do not communicate by sharing memory; instead, share memory by communicating."
https://blog.golang.org/share-memory-by-communicating

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is generally true in go code, and I wouldn't allow this kind shared memory into code paths that execute in production. However, this is a test, and clarity of code under test far outweighs that "rule". What's going on in the test is far more obvious using these atomics compared to using channels. Look at the final Eventually, it's very explicit: eventually these two values will both be n, or the test fails. Compare that to the loop with channels/switch, where that very clear predicate is muddled by handling timeout and writes on the channel. '

Further, your code busy-waits: there's no delay or sleeping, it just compares added and deleted to n as fast as it can in a loop (or, occasionally, gets notified on a change on the other channels). Given our tests execute in parallel, that means you're occupying a core doing nothing while we could be executing another test in parallel.

Finally, which do you think is more maintainable in the long run? Less code that's more explicit about its meaning always wins in my book.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for exactly the same reason go doesn't provide a min/max library. I prefer this not to be unified under one util.

Copy link
Contributor Author

@ZackButcher ZackButcher Mar 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golang does in fact provide max/min in a library: it's right in the standard library.

However, a function that can be written in four lines:

func min(a, b MyNumType) MyNumType {
    if a < b {
        return a
    }
    return b
}

is a pretty far cry from a channel based setup. I'm pretty confident I can write min correctly every time, I'm not confident I can write eventual polling via channels correctly every time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the change since it's in test. I changed these to channels to remove flakiness issues and race issues. Please make sure it passes flaky test (-count 1000) and -racetest before submission.

@@ -155,18 +166,16 @@ func (s *TestSetup) ReStartEnvoy() {

// VerifyCheckCount verifies the number of Check calls.
func (s *TestSetup) VerifyCheckCount(tag string, expected int) {
if s.mixer.check.Count() != expected {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why these needs to have an eventual waiting loop, can we notify the verification by blocking on channels?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not without large changes across the mixer test suite.


numNotifications := atomic.NewInt64(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't use atomic. see the reasons below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a totally legitimate use of atomic ops. Channels would end up considerably more complex, obfuscated, and higher overhead.

@mandarjog
Copy link
Contributor

Why not gomega.Eventually with channels in the eventual function? Or you can re evaluate the whole thing?

Gomega is already. Dependency of istio.

@ZackButcher
Copy link
Contributor Author

ZackButcher commented Mar 14, 2018

@mandarjog gomega requires using matchers to use Eventually, so you have to adopt gomega for the entire test and not just the portion that needs polling (I did look at using gomega first, or implementing this function on top of gomega's Eventually). The goal is that this is an in-between to consolidate polling logic. If we move everything over to gomega then it should be easy to change tests that use this.

@ZackButcher
Copy link
Contributor Author

/retest

@codecov
Copy link

codecov bot commented Mar 14, 2018

Codecov Report

Merging #4231 into master will decrease coverage by 4%.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #4231    +/-   ##
=======================================
- Coverage      75%     72%    -3%     
=======================================
  Files         289     295     +6     
  Lines       24586   25406   +820     
=======================================
- Hits        18313   18140   -173     
- Misses       5505    6501   +996     
+ Partials      768     765     -3
Impacted Files Coverage Δ
mixer/adapter/prometheus/prometheus.go 89% <0%> (-11%) ⬇️
mixer/adapter/servicecontrol/reportprocessor.go 79% <0%> (-1%) ⬇️
pilot/cmd/pilot-agent/main.go 38% <0%> (ø) ⬇️
mixer/adapter/denier/denier.go 100% <0%> (ø) ⬆️
mixer/cmd/mixc/cmd/check.go 0% <0%> (ø) ⬆️
mixer/adapter/stdio/stdio.go 100% <0%> (ø) ⬆️
mixer/adapter/stdio/zap.go 100% <0%> (ø) ⬆️
mixer/adapter/dogstatsd/dogstatsd.go 100% <0%> (ø) ⬆️
mixer/adapter/memquota/rollingWindow.go 100% <0%> (ø) ⬆️
mixer/adapter/list/ipList.go 100% <0%> (ø) ⬆️
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c13fa4b...9709ff5. Read the comment docs.

@ZackButcher
Copy link
Contributor Author

/retest

@ZackButcher
Copy link
Contributor Author

@xiaolanz PTAL. I do not plan to remove the atomics.


// We didn't get a happy fast-path, so set up timers and wait.
timeout := time.NewTimer(e.deadline).C
poll := time.NewTicker(e.interval).C
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous eventually uses exponential backoff, for what reason we don't do it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly because this is way simpler (exponential backoff means we can't use a ticker, have to sleep ourselves). I could add back exponential backoff if we need it; I'm not convinced we do for these usecases. Only one of these usages is actually waiting on/querying an external system where exponential backoff would be valuable.

@ZackButcher
Copy link
Contributor Author

/retest

@ZackButcher ZackButcher changed the title Add back and Eventually test helper, use it across a few tests that have been flaky Add back an Eventually test helper, use it across a few tests that have been flaky Mar 15, 2018
Copy link
Contributor

@geeknoid geeknoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this simple library that makes tests both simpler to read and more robust.

We could pull in an incremental backoff library to make this even cleaner...


numNotifications := atomic.NewInt64(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a totally legitimate use of atomic ops. Channels would end up considerably more complex, obfuscated, and higher overhead.

defaultDeadline = 30 * time.Second
)

// DEFAULT is the default EventualOpts instance used for the static test.Eventually function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opts -> Ops

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code with "Ops" everywhere, I think I like "Opts" a little better (IMO ops = operations, opts = options). I just wasn't consistent in my own usage due to going back and forth on the name.

}

// Eventual constructs an EventualOpts instance with the provided polling interval and deadline.
func NewEventualOps(interval, deadline time.Duration) EventualOpts {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be NewEventualOpts? Or should the return value type be named EventualOps?

DEFAULT.Eventually(t, name, cond)
}

// EventualOpts is defines a polling strategy for operations that must eventually succeed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is defines -> defines

}
}

// Eventually polls cond until it succeeds (returns true) or we exceed the deadline. cond itself can also fail the test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you look at any of the existing exponential backoff libraries out there as potential building blocks for this code? For example: https://godoc.org/github.com/cenkalti/backoff

// using t.Fatal to exit early.
//
// Eventually does not perform backoff of its polling, but polls at a constant rate.
func (e EventualOpts) Eventually(t *testing.T, name string, cond Condition) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps mention what 'name' is for.
Perhaps mention the fact "cond" does not need to be thread-safe, it is only called from the current goroutine.

@kyessenov
Copy link
Contributor

My opinion is that these tests should be removed from the unit test suite. If we have to observe eventual consistency, then it's not really a unit test.

@ZackButcher ZackButcher requested a review from a team March 16, 2018 00:24
@istio-merge-robot
Copy link

@ZackButcher PR needs rebase

@istio-merge-robot istio-merge-robot added the needs-rebase Indicates a PR needs to be rebased before being merged label Mar 16, 2018
Copy link
Contributor Author

@ZackButcher ZackButcher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use exponential backoff, PTAL. I do agree with Kuat that these tests should be moved out of the unit test suite.

defaultDeadline = 30 * time.Second
)

// DEFAULT is the default EventualOpts instance used for the static test.Eventually function.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code with "Ops" everywhere, I think I like "Opts" a little better (IMO ops = operations, opts = options). I just wasn't consistent in my own usage due to going back and forth on the name.

@ZackButcher
Copy link
Contributor Author

(Need to nail down updating the dependencies, too, so it'll be a bit for that.)

@ldemailly
Copy link
Member

@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Mar 28, 2018
@ZackButcher
Copy link
Contributor Author

Vendor update in istio/old_vendor-istio_repo#42

@ZackButcher
Copy link
Contributor Author

/retest

@kyessenov
Copy link
Contributor

master is broken, you should have waited on merging vendor until this passes the tests after rebase.

@ldemailly
Copy link
Member

semi urgent ping:

cd vendor; git checkout master; git pull; then cd ..; cp vendor/Gopkg.* . ; git add ; commit; update pr
(ie sync back your PR to what is actually in vendor's master)

@ldemailly
Copy link
Member

/lgtm

@istio-testing
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ldemailly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ldemailly
Copy link
Member

@ZackButcher

pkg/test/eventually.go:69:3:error: should use a simple channel send/receive instead of select with a single case (S1000) (megacheck)
Makefile:317: recipe for target 'lint' failed

can we please get this in - I'd like to move forward with the vendor sync

@kyessenov
Copy link
Contributor

kyessenov commented Mar 29, 2018 via email

@istio-testing
Copy link
Collaborator

New changes are detected. LGTM label has been removed.

@istio-testing
Copy link
Collaborator

@ZackButcher: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
prow/istio-pilot-e2e-v1alpha3.sh 9709ff5 link /test istio-pilot-e2e-v1alpha3
prow/istio-pilot-e2e.sh 9709ff5 link /test istio-pilot-e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@ldemailly ldemailly merged commit 9146694 into istio:master Mar 29, 2018
bianpengyuan pushed a commit to bianpengyuan/istio that referenced this pull request Apr 5, 2018
…ew tests that have been flaky (istio#4231)

* Fix up eventually for latest repo changes

* fix imports

* update gopkg and vendor ref

* typo

* fix linter complaints
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants