inctimer: Fix bug where timer fired immediately #16955

gandro · 2021-07-21T11:48:48Z

This fixes a bug where the IncTimer.After would fire immediately due
to a rare race. See the comment in the diff as to how the race
could occur. This commit also adds a unit test which has a high
likelihood of triggering the bug in the old code.

Discovered this will debugging #15442 (which is not fully fixed by this PR,
but the reason why that other unit test faililure surfaced sporadically was
because of this bug here)

Fix bug where timers used for retries sometimes fired immediately

rolinh

Nice catch ! And the fix is also extremely well documented 🚀

pkg/inctimer/inctimer_test.go

gandro · 2021-07-21T13:00:38Z

test-me-please

This commit fixes cilium#15442 (and variants), where the `done` channel used to indicate completion to the test driver could be closed twice. This happened because at the end of a test, most mock client will start returning `io.EOF`. Due to cilium#16955, this sometimes caused the peer manager to reconnect immediately and create a new mock client, which would then attempt to re-run the test-logic again. This commit addresses this issue by ensuring that all mock clients within a test share the same state (i.e. the `i` counter and `once` instance). This way, each mock client instance will continue the work of its predecessor instead of replaying the whole test sequence. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

gandro · 2021-07-21T16:13:06Z

test-1.16-netnext

Edit: net-next hit a variant of #16399
https://jenkins.cilium.io/job/Cilium-PR-K8s-1.16-net-next/1135/

Since this PR fixes another CI flake, this should not block this PR.

christarazi

One minor nit and one curious question. The documentation is as Michi would say, "amazing man" 🚀

pkg/inctimer/inctimer.go

pkg/inctimer/inctimer_test.go

This fixes a bug where the `IncTimer.After` would fire immediately due to a rare race. See the comment in the diff as to how the race could occur. This commit also adds a unit test which has a high likelihood of triggering the bug in the old code. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

gandro · 2021-07-22T09:11:57Z

test-me-please

This commit fixes #15442 (and variants), where the `done` channel used to indicate completion to the test driver could be closed twice. This happened because at the end of a test, most mock client will start returning `io.EOF`. Due to #16955, this sometimes caused the peer manager to reconnect immediately and create a new mock client, which would then attempt to re-run the test-logic again. This commit addresses this issue by ensuring that all mock clients within a test share the same state (i.e. the `i` counter and `once` instance). This way, each mock client instance will continue the work of its predecessor instead of replaying the whole test sequence. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

gandro · 2021-07-22T15:32:44Z

The failure in net-next is unrelated, very similar to #16928: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.16-net-next/1140/testReport/junit/Suite-k8s-1/16/K8sPolicyTest_Basic_Test_Validate_to_entities_policies_Validate_toEntities_All/

2021-07-22T12:09:00.762112917Z level=error msg="Error while rewriting endpoint BPF program" containerID= datapathPolicyRevision=40 desiredPolicyRevision=41 endpointID=61 error="Failed to replace Qdisc for lxc92200f0d9c8b: Link not found" identity=33765 ipv4= ipv6= k8sPodName=/ subsys=endpoint

Marking this ready to merge as soon as remaining reviews are in, as this is a bug fix and thus should be exempt from the merge freeze.

[ upstream commit 31176f7 ] This commit fixes #15442 (and variants), where the `done` channel used to indicate completion to the test driver could be closed twice. This happened because at the end of a test, most mock client will start returning `io.EOF`. Due to #16955, this sometimes caused the peer manager to reconnect immediately and create a new mock client, which would then attempt to re-run the test-logic again. This commit addresses this issue by ensuring that all mock clients within a test share the same state (i.e. the `i` counter and `once` instance). This way, each mock client instance will continue the work of its predecessor instead of replaying the whole test sequence. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

christarazi · 2021-09-14T20:44:00Z

Marking for backport to v1.9 for the same reasons as #14380 (comment).

This commit fixes cilium#15442 (and variants), where the `done` channel used to indicate completion to the test driver could be closed twice. This happened because at the end of a test, most mock client will start returning `io.EOF`. Due to cilium#16955, this sometimes caused the peer manager to reconnect immediately and create a new mock client, which would then attempt to re-run the test-logic again. This commit addresses this issue by ensuring that all mock clients within a test share the same state (i.e. the `i` counter and `once` instance). This way, each mock client instance will continue the work of its predecessor instead of replaying the whole test sequence. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

gandro added kind/bug This is a bug in the Cilium logic. needs-backport/1.10 labels Jul 21, 2021

gandro requested a review from a team as a code owner July 21, 2021 11:48

maintainer-s-little-helper bot added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Jul 21, 2021

gandro added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Jul 21, 2021

maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jul 21, 2021

gandro force-pushed the pr/gandro/fix-inctimer-firing-too-soon branch from 8c28cb6 to 47531c0 Compare July 21, 2021 11:51

rolinh approved these changes Jul 21, 2021

View reviewed changes

pkg/inctimer/inctimer_test.go Outdated Show resolved Hide resolved

gandro force-pushed the pr/gandro/fix-inctimer-firing-too-soon branch from 47531c0 to 68123f3 Compare July 21, 2021 12:00

gandro mentioned this pull request Jul 21, 2021

hubble/relay: Fix close of closed channel in unit test #16958

Merged

christarazi reviewed Jul 21, 2021

View reviewed changes

pkg/inctimer/inctimer.go Outdated Show resolved Hide resolved

pkg/inctimer/inctimer_test.go Outdated Show resolved Hide resolved

gandro force-pushed the pr/gandro/fix-inctimer-firing-too-soon branch from 68123f3 to 65cd5fb Compare July 22, 2021 08:19

brb approved these changes Jul 22, 2021

View reviewed changes

gandro requested a review from christarazi July 22, 2021 15:38

maintainer-s-little-helper bot assigned christarazi Jul 22, 2021

christarazi approved these changes Jul 22, 2021

View reviewed changes

maintainer-s-little-helper bot unassigned christarazi Jul 22, 2021

gandro added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 26, 2021

brb merged commit a315b45 into cilium:master Jul 27, 2021

pchaigno mentioned this pull request Jul 28, 2021

v1.10 backports 2021-07-28 #17011

Merged

pchaigno added backport-pending/1.10 and removed needs-backport/1.10 labels Jul 28, 2021

joestringer added backport-done/1.10 and removed backport-pending/1.10 labels Sep 1, 2021

joestringer mentioned this pull request Sep 1, 2021

Prepare for release v1.10.4 #17287

Merged

christarazi added the needs-backport/1.9 label Sep 14, 2021

christarazi mentioned this pull request Sep 14, 2021

[v1.9] Backport Operator improvements #17398

Merged

christarazi added backport-done/1.9 and removed needs-backport/1.9 labels Oct 4, 2021

joestringer mentioned this pull request Nov 5, 2021

Prepare for release v1.9.11 #17805

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

inctimer: Fix bug where timer fired immediately #16955

inctimer: Fix bug where timer fired immediately #16955

Uh oh!

gandro commented Jul 21, 2021 •

edited

Loading

Uh oh!

rolinh left a comment

Uh oh!

Uh oh!

gandro commented Jul 21, 2021

Uh oh!

gandro commented Jul 21, 2021 •

edited

Loading

Uh oh!

christarazi left a comment

Uh oh!

Uh oh!

Uh oh!

gandro commented Jul 22, 2021

Uh oh!

gandro commented Jul 22, 2021

Uh oh!

christarazi commented Sep 14, 2021

Uh oh!

Uh oh!

inctimer: Fix bug where timer fired immediately #16955

inctimer: Fix bug where timer fired immediately #16955

Uh oh!

Conversation

gandro commented Jul 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rolinh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gandro commented Jul 21, 2021

Uh oh!

gandro commented Jul 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

christarazi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gandro commented Jul 22, 2021

Uh oh!

gandro commented Jul 22, 2021

Uh oh!

christarazi commented Sep 14, 2021

Uh oh!

Uh oh!

gandro commented Jul 21, 2021 •

edited

Loading

gandro commented Jul 21, 2021 •

edited

Loading