chore: Fix to intermittent E2E test failures in deployment_test.go #20974

jgwest · 2024-11-27T14:30:18Z

This PR is a simple test update to fix flaky E2E tests.

A couple of E2E tests (that I originally wrote some time ago) are intermittently failing due to a race condition in acquisition of the ServiceAccount secret.

Both TestArgoCDSupportsMultipleServiceAccountsWithDifferingRBACOnSameCluster and TestDeployToKubernetesAPIURLWithQueryParameter, in deployment_test.go, attempt to acquire a ServiceAccount using GetServiceAccountBearerToken from clusterauth.go. However, within that package, createServiceAccountToken can fail with the following error:

failed to patch serviceaccount "e2e-test-user1-serviceaccount" with bearer token secret: Operation cannot be fulfilled on serviceaccounts "e2e-test-user1-serviceaccount": the object has been modified; please apply your changes to the latest version and try again

As you can probably guess, this is because another process is modifying the resource before our code can complete the patch.

The proposed fix is to retry on failure, up to a maximum of 20 seconds.

I have run these tests over and over for several hours, and did not hit the issue, so as best as I can tell the issue is resolved with this PR.

Example of failing run:

=== RUN   TestDeployToKubernetesAPIURLWithQueryParameter
time="2024-08-29T18:33:20Z" level=info msg="kubectl delete ns -l e2e.argoproj.io=true --field-selector status.phase=Active --wait=false" dir= execID=4bf1a
time="2024-08-29T18:33:20Z" level=debug msg="namespace \"argocd-e2e--test-deployment-without-tracking-mode-jlxqv\" deleted\n" duration=280.930217ms execID=4bf1a
time="2024-08-29T18:33:20Z" level=info msg="kubectl delete crd -l e2e.argoproj.io=true --wait=false" dir= execID=26fa6
time="2024-08-29T18:33:21Z" level=debug msg="No resources found\n" duration=736.63983ms execID=26fa6
time="2024-08-29T18:33:21Z" level=info msg="kubectl delete clusterroles -l e2e.argoproj.io=true --wait=false" dir= execID=a6188
time="2024-08-29T18:33:21Z" level=debug msg="No resources found\n" duration=220.337006ms execID=a6188
time="2024-08-29T18:33:22Z" level=warning msg="Failed to invoke grpc call. Use flag --grpc-web in grpc calls. To avoid this warning message, use flag --grpc-web."
time="2024-08-29T18:33:22Z" level=info msg="../../dist/argocd proj create gpg --server argocd-test-server-argocd-e2e.apps.rosa-qe-415.i1qm.p1.openshiftapps.com --auth-token eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJhcmdvY2QiLCJzdWIiOiJhZG1pbjpsb2dpbiIsImV4cCI6MTcyNTA0MjgwMiwibmJmIjoxNzI0OTU2NDAyLCJpYXQiOjE3MjQ5NTY0MDIsImp0aSI6IjUxNzU4OWFjLTQxMjktNDNkMS1hOTAwLWUxZmQzNWZkNjdlNyJ9.69GMMnp64BvgJZ7ZJaCRP2mqODctv44QyOMa2oGKgpI --insecure" dir= execID=3f5aa
time="2024-08-29T18:33:22Z" level=debug duration=132.070553ms execID=3f5aa
time="2024-08-29T18:33:22Z" level=info msg="mkdir -p /tmp/argo-e2e" dir= execID=effe0
time="2024-08-29T18:33:22Z" level=debug duration=1.970259ms execID=effe0
time="2024-08-29T18:33:22Z" level=info msg="mkdir -p /tmp/argo-e2e/gpg" dir= execID=f1823
time="2024-08-29T18:33:22Z" level=debug duration=1.607923ms execID=f1823
time="2024-08-29T18:33:22Z" level=info msg="chmod 0700 /tmp/argo-e2e/gpg" dir= execID=431be
time="2024-08-29T18:33:22Z" level=debug duration=1.778624ms execID=431be
time="2024-08-29T18:33:22Z" level=info msg="pkill -9 gpg-agent" dir= execID=0e848
time="2024-08-29T18:33:22Z" level=info msg="gpg --import ../fixture/gpg/signingkey.asc" dir= execID=91772
time="2024-08-29T18:33:22Z" level=debug duration=10.727778ms execID=91772
time="2024-08-29T18:33:22Z" level=info msg="cp -Rf testdata /tmp/argo-e2e/testdata.git" dir= execID=bad2d
time="2024-08-29T18:33:23Z" level=debug duration=265.091727ms execID=bad2d
time="2024-08-29T18:33:23Z" level=info msg="chmod 777 ." dir=/tmp/argo-e2e/testdata.git execID=528cb
time="2024-08-29T18:33:23Z" level=debug duration=1.88798ms execID=528cb
time="2024-08-29T18:33:23Z" level=info msg="git init -b master" dir=/tmp/argo-e2e/testdata.git execID=d7dd0
time="2024-08-29T18:33:23Z" level=debug msg="Initialized empty Git repository in /tmp/argo-e2e/testdata.git/.git/\n" duration=3.832245ms execID=d7dd0
time="2024-08-29T18:33:23Z" level=info msg="git add ." dir=/tmp/argo-e2e/testdata.git execID=90e4d
time="2024-08-29T18:33:23Z" level=debug duration=30.579905ms execID=90e4d
time="2024-08-29T18:33:23Z" level=info msg="git commit -q -m initial commit" dir=/tmp/argo-e2e/testdata.git execID=c1d5c
time="2024-08-29T18:33:23Z" level=debug duration=23.928072ms execID=c1d5c
time="2024-08-29T18:33:23Z" level=info msg="git remote add origin http://127.0.0.1:9081/argo-e2e/testdata.git" dir=/tmp/argo-e2e/testdata.git execID=ff654
time="2024-08-29T18:33:23Z" level=debug duration=2.302404ms execID=ff654
time="2024-08-29T18:33:23Z" level=info msg="git push origin master -f" dir=/tmp/argo-e2e/testdata.git execID=86119
time="2024-08-29T18:33:23Z" level=debug duration=137.342519ms execID=86119
time="2024-08-29T18:33:23Z" level=info msg="kubectl create ns argocd-e2e--test-deploy-to-kubernetes-apiurl-with-query-p-xcrix" dir= execID=d2123
time="2024-08-29T18:33:23Z" level=debug msg="namespace/argocd-e2e--test-deploy-to-kubernetes-apiurl-with-query-p-xcrix created\n" duration=140.73248ms execID=d2123
time="2024-08-29T18:33:23Z" level=info msg="kubectl label ns argocd-e2e--test-deploy-to-kubernetes-apiurl-with-query-p-xcrix e2e.argoproj.io=true" dir= execID=4897e
time="2024-08-29T18:33:23Z" level=debug msg="namespace/argocd-e2e--test-deploy-to-kubernetes-apiurl-with-query-p-xcrix labeled\n" duration=311.18895ms execID=4897e
time="2024-08-29T18:33:23Z" level=info msg="clean state" duration=3.781812414s id=TestDeployToKubernetesAPIURLWithQueryParameter-xcrix name=TestDeployToKubernetesAPIURLWithQueryParameter password=password username=admin
time="2024-08-29T18:33:23Z" level=info msg="ServiceAccount \"e2e-test-user1-serviceaccount\" created in namespace \"e2e-test-user1\""
time="2024-08-29T18:33:24Z" level=info msg="Created bearer token secret for ServiceAccount \"e2e-test-user1-serviceaccount\""
    deployment_test.go:313: 
            Error Trace:    /argocd-e2e/argo-cd/test/e2e/deployment_test.go:313
                                        /argocd-e2e/argo-cd/test/e2e/deployment_test.go:137
            Error:          Received unexpected error:
                            failed to patch serviceaccount "e2e-test-user1-serviceaccount" with bearer token secret: Operation cannot be fulfilled on serviceaccounts "e2e-test-user1-serviceaccount": the object has been modified; please apply your changes to the latest version and try again
            Test:           TestDeployToKubernetesAPIURLWithQueryParameter
--- FAIL: TestDeployToKubernetesAPIURLWithQueryParameter (3.99s){noformat}

Checklist:

Signed-off-by: Jonathan West <jonwest@redhat.com>

bunnyshell · 2024-11-27T14:30:27Z

❌ Preview Environment deleted from Bunnyshell

Available commands (reply to this comment):

🚀 /bns:deploy to deploy the environment

codecov · 2024-11-27T15:21:54Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 55.04%. Comparing base (7f6340f) to head (5e0ab01).
Report is 414 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #20974      +/-   ##
==========================================
+ Coverage   54.99%   55.04%   +0.04%     
==========================================
  Files         324      324              
  Lines       55466    55466              
==========================================
+ Hits        30505    30530      +25     
+ Misses      22341    22322      -19     
+ Partials     2620     2614       -6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

andrii-korotkov-verkada · 2024-11-27T15:56:13Z

test/e2e/deployment_test.go

+		return (err == nil && token != ""), nil
+	})
+	require.NoError(t, waitErr)
+	require.NotEmpty(t, token)


Do you know what other process can modify it? It might be a sign of some race condition in ArgoCD, which we may need to fix.

My expectation is that its a core Kubernetes controller that happens to modify the ServiceAccount at the same time as Argo CD. Resource contention on shared resources is pretty common in cases like this. Retrying/re-reconciling on resource update error is a pretty standard controller/operator pattern.

andrii-korotkov-verkada · 2024-11-27T19:45:38Z

test/e2e/deployment_test.go

+		token, err = clusterauth.GetServiceAccountBearerToken(KubeClientset, ns.Name, serviceAccountName, time.Second*60)
+
+		// Success is no error and a real token, otherwise keep trying
+		return (err == nil && token != ""), nil


Shall we return err here, not nil?

Returning non-nil error here is fatal: it will stop the loop, as per PollUntilContextCancel docs. But, in this case, if err is non-nil, this is only a temporary failure, which is likely to be resolved by subsequent iterations. Thus it is not appropriate to return non-nil error from within the loop.

However, if the loop is not able to complete in the allotted timeframe, the context is canceled, and error is returned from the function, which will be caught and fail the require

ishitasequeira

Changes LGTM!! Thanks @jgwest for fixing the flaky tests!!

…rgoproj#20974) Signed-off-by: Jonathan West <jonwest@redhat.com> Signed-off-by: Adrian Aneci <aneci@adobe.com>

…rgoproj#20974) Signed-off-by: Jonathan West <jonwest@redhat.com>

chore: Fix to intermittent E2E test failures in deployment_test.go

5e0ab01

Signed-off-by: Jonathan West <jonwest@redhat.com>

jgwest requested a review from a team as a code owner November 27, 2024 14:30

andrii-korotkov-verkada reviewed Nov 27, 2024

View reviewed changes

andrii-korotkov-verkada approved these changes Nov 29, 2024

View reviewed changes

andrii-korotkov-verkada added the ready-for-review An approver should give a final review and merge the PR label Nov 29, 2024

ishitasequeira approved these changes Dec 3, 2024

View reviewed changes

ishitasequeira merged commit 9587ec9 into argoproj:master Dec 3, 2024
35 checks passed

adriananeci pushed a commit to adriananeci/argo-cd that referenced this pull request Dec 4, 2024

chore: Fix to intermittent E2E test failures in deployment_test.go (a…

77b9845

…rgoproj#20974) Signed-off-by: Jonathan West <jonwest@redhat.com> Signed-off-by: Adrian Aneci <aneci@adobe.com>

revitalbarletz pushed a commit to revitalbarletz/argo-cd that referenced this pull request Jan 20, 2025

chore: Fix to intermittent E2E test failures in deployment_test.go (a…

54ce072

…rgoproj#20974) Signed-off-by: Jonathan West <jonwest@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: Fix to intermittent E2E test failures in deployment_test.go #20974

chore: Fix to intermittent E2E test failures in deployment_test.go #20974

Uh oh!

jgwest commented Nov 27, 2024 •

edited

Loading

Uh oh!

bunnyshell bot commented Nov 27, 2024 •

edited

Loading

Uh oh!

codecov bot commented Nov 27, 2024 •

edited

Loading

Uh oh!

andrii-korotkov-verkada Nov 27, 2024

Uh oh!

jgwest Nov 27, 2024

Uh oh!

andrii-korotkov-verkada Nov 27, 2024

Uh oh!

jgwest Nov 29, 2024

Uh oh!

ishitasequeira left a comment

Uh oh!

Uh oh!

Uh oh!

chore: Fix to intermittent E2E test failures in deployment_test.go #20974

chore: Fix to intermittent E2E test failures in deployment_test.go #20974

Uh oh!

Conversation

jgwest commented Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bunnyshell bot commented Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ Preview Environment deleted from Bunnyshell

Uh oh!

codecov bot commented Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

andrii-korotkov-verkada Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

jgwest Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

andrii-korotkov-verkada Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

jgwest Nov 29, 2024

Choose a reason for hiding this comment

Uh oh!

ishitasequeira left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jgwest commented Nov 27, 2024 •

edited

Loading

bunnyshell bot commented Nov 27, 2024 •

edited

Loading

codecov bot commented Nov 27, 2024 •

edited

Loading