Skip to content

Conversation

jschwinger233
Copy link
Member

@jschwinger233 jschwinger233 commented Jun 7, 2024

Golang runs testcases concurrently, the test code "list xfrm policies" followed by "delete policies" could fail because other goroutines from other testcases could delete the xfrm policies in between. We fix this issue by adding mutex for testcases that are handling xfrm.

The ci-runtime has run for 10 times and I didn't see a single failure from github.com/cilium/cilium/pkg/datapath/linux/ipsec: https://github.com/cilium/cilium/actions/runs/9442804151

Fixes: #32902

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jun 7, 2024
@jschwinger233 jschwinger233 force-pushed the pr/gray/main/xfrm-delete-flake branch from 23d3168 to 36509df Compare June 7, 2024 10:51
@jschwinger233 jschwinger233 changed the title ci: dummy ci: Prohibit concurrent execution of cleanIPSecStatesAndPolicies() Jun 7, 2024
@jschwinger233 jschwinger233 force-pushed the pr/gray/main/xfrm-delete-flake branch 3 times, most recently from 71cfa98 to ff709dd Compare June 10, 2024 05:08
@jschwinger233
Copy link
Member Author

/ci-runtime

@jschwinger233
Copy link
Member Author

After running ci-runtime for 10 times, I didn't see a single failure from github.com/cilium/cilium/pkg/datapath/linux/ipsec: https://github.com/cilium/cilium/actions/runs/9442804151

2 out 10 failed due to node_linux_test.go:TestNodeChurnXFRMLeaks, I'll open an issue for it.

@jschwinger233 jschwinger233 changed the title ci: Prohibit concurrent execution of cleanIPSecStatesAndPolicies() ci: Don't handle xfrm concurrently in privileged test Jun 11, 2024
@jschwinger233 jschwinger233 added area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. release-note/ci This PR makes changes to the CI. ci/flake This is a known failure that occurs in the tree. Please investigate me! feature/ipsec Relates to Cilium's IPsec feature labels Jun 11, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Jun 11, 2024
Golang runs testcases concurrently, the test code "list xfrm policies" followed by "delete policies" could fail because other goroutines from other testcases could delete the xfrm policies in between. We fix this issue by adding mutex for testcases that are handling xfrm.

Fixes: #32902

Signed-off-by: gray <gray.liang@isovalent.com>
@jschwinger233 jschwinger233 force-pushed the pr/gray/main/xfrm-delete-flake branch from ff709dd to 4c99df6 Compare June 11, 2024 05:26
@jschwinger233
Copy link
Member Author

/test

@jschwinger233 jschwinger233 marked this pull request as ready for review June 11, 2024 05:27
@jschwinger233 jschwinger233 requested a review from a team as a code owner June 11, 2024 05:27
@marseel
Copy link
Contributor

marseel commented Jun 11, 2024

I'm pretty sure that testcases within a single package are not run in parallel, from https://pkg.go.dev/cmd/go#hdr-Testing_flags

-parallel n
    Allow parallel execution of test functions that call t.Parallel, and
    fuzz targets that call t.Parallel when running the seed corpus.
    The value of this flag is the maximum number of tests to run
    simultaneously.
    While fuzzing, the value of this flag is the maximum number of
    subprocesses that may call the fuzz function simultaneously, regardless of
    whether T.Parallel is called.
    By default, -parallel is set to the value of GOMAXPROCS.
    Setting -parallel to values higher than GOMAXPROCS may cause degraded
    performance due to CPU contention, especially when fuzzing.
    Note that -parallel only applies within a single test binary.
    The 'go test' command may run tests for different packages
    in parallel as well, according to the setting of the -p flag
    (see 'go help build').

Especially the last part "Note that -parallel only applies within a single test binary.
The 'go test' command may run tests for different packages
in parallel as well, according to the setting of the -p flag
(see 'go help build')."

And default for "-p":

-p n
	the number of programs, such as build commands or
	test binaries, that can be run in parallel.
	The default is GOMAXPROCS, normally the number of CPUs available.

2 out 10 failed due to node_linux_test.go:TestNodeChurnXFRMLeaks, I'll open an issue for it.

That would indicate that TestNodeChurnXFRMLeaks and these test were run in parallel, which would explain why one of them could fail.

I think the simplest solution would be to set -p 1 for tests-privileged in Makefile and also -parallel 1 just in case. After all, these are privileged tests so I would not expect to run them in parallel.

@rgo3
Copy link
Contributor

rgo3 commented Jun 13, 2024

We could execute the IPsec unit tests in separate test namespaces so that global state we create doesn't conflict in parallel runs. We do that in other privileged tests as well, see

func TestAttachXDPWithExistingLink(t *testing.T) {
as an example.

@msune
Copy link
Member

msune commented Jul 9, 2024

What's the status of this one?

@jschwinger233
Copy link
Member Author

Closed as issue has been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. ci/flake This is a known failure that occurs in the tree. Please investigate me! feature/ipsec Relates to Cilium's IPsec feature release-note/ci This PR makes changes to the CI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI: Conformance Runtime - TestUpdateExistingIPSecEndpoint
4 participants