Skip to content

Conversation

aanm
Copy link
Member

@aanm aanm commented Oct 11, 2024

The changes on this PR helps to decrease the test run time significantly. During testing if this was possible, a lot of flakes were being caused by the egress-gateway, egress-gateway-with-l7-policy and from-cidr-host-netns tests which is why they run separately without concurrency and then all the other tests run parallel afterwards.

You can find 10 successful and unsuccessful runs in https://github.com/cilium/cilium/actions/runs/11496595336

Note: The failed runs from the link above were all caused due short timeouts which is fixed by the last commit.

Name Before After Time Decreased
Test 1 00:49:48 00:29:54 -39.96%
Test 2 00:00:23 00:00:19 -17.39%
Test 3 00:49:08 00:25:04 -48.98%
Test 4 00:00:23 00:00:21 -8.7%
Test 5 00:49:18 00:26:30 -46.25%
Test 6 00:00:22 00:00:23 4.55%
Test 7 00:44:46 00:21:20 -52.35%
Test 8 00:00:19 00:00:21 10.35%
Test 9 00:49:34 00:25:33 -48.45%
Test 10 00:00:22 00:00:19 -13.64%

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 11, 2024
@aanm
Copy link
Member Author

aanm commented Oct 11, 2024

/ci-ipsec-upgrade

@github-actions github-actions bot added the cilium-cli This PR contains changes related with cilium-cli label Oct 11, 2024
@aanm aanm force-pushed the pr/sync-upgrade-e2e-concurrency-ipsec branch from 99db928 to 190fab5 Compare October 14, 2024 08:02
@aanm aanm changed the title test CI Add concurrency to test-ipsec-upgrade Oct 14, 2024
@aanm aanm added area/CI Continuous Integration testing issue or flake release-note/ci This PR makes changes to the CI. labels Oct 14, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Oct 14, 2024
@aanm
Copy link
Member Author

aanm commented Oct 14, 2024

/test

@aanm aanm marked this pull request as ready for review October 14, 2024 08:02
@aanm aanm requested review from a team as code owners October 14, 2024 08:02
@aanm aanm force-pushed the pr/sync-upgrade-e2e-concurrency-ipsec branch from 190fab5 to 4731fd4 Compare October 15, 2024 07:05
@aanm
Copy link
Member Author

aanm commented Oct 15, 2024

/test

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I'm quite confused about how egress gateway tests could be an issue in the IPsec workflow... given we don't run them there.

@aanm
Copy link
Member Author

aanm commented Oct 16, 2024

Looks good, but I'm quite confused about how egress gateway tests could be an issue in the IPsec workflow... given we don't run them there.

Ok, if we don't run them here then I can drop this commit. I wasn't sure about it as I copied this commit from a similar concurrent PR (#34806) and it didn't matter which PR would be merged first.

@aanm aanm force-pushed the pr/sync-upgrade-e2e-concurrency-ipsec branch from 4731fd4 to 9fbf7a1 Compare October 16, 2024 21:22
@aanm
Copy link
Member Author

aanm commented Oct 16, 2024

/test

@aanm aanm requested a review from pchaigno October 16, 2024 21:22
@aanm aanm enabled auto-merge October 16, 2024 21:23
@aanm aanm disabled auto-merge October 16, 2024 21:23
@pchaigno
Copy link
Member

Looks good, but I'm quite confused about how egress gateway tests could be an issue in the IPsec workflow... given we don't run them there.

Ok, if we don't run them here then I can drop this commit. I wasn't sure about it as I copied this commit from a similar concurrent PR (#34806) and it didn't matter which PR would be merged first.

Ack for the egress gateway tests, but the from-cidr-host-netns I believe is executed in the IPsec workflows.

@pchaigno
Copy link
Member

Hm, the failure in the IPsec upgrade tests feels a bit suspicious:

🟥 failed to flush ct entries: %w command failed (pod=kube-system/cilium-qq2x7, container=cilium-agent): "time=\"2024-10-16T21:40:39.620403221Z\" level=error msg=\"Unable to delete CT entry\" error=\"unable to delete element 10.244.2.53:61575 --> 10.244.2.84:62751, 6, 1 from map cilium_ct4_global: delete: key does not exist\" key=\"10.244.2.53:61575 --> 10.244.2.84:62751, 6, 1\" subsys=map-ct\n"

I haven't seen this failure before, so wondering if it could be related to the changes made here. Isn't it possible we're trying to flush the CT map in parallel and therefore failing?

@aanm aanm marked this pull request as draft October 22, 2024 08:35
@aanm aanm force-pushed the pr/sync-upgrade-e2e-concurrency-ipsec branch from 9fbf7a1 to a2927f7 Compare October 24, 2024 09:33
@aanm
Copy link
Member Author

aanm commented Oct 24, 2024

/ci-ipsec-upgrade

aanm added 2 commits October 24, 2024 22:14
Adding concurrency to tests-e2e upgrade workflow helps to decrease the
time it takes to run the tests on our CI.

| Name      | Before    | After     | Time Decreased |
|-----------|-----------|-----------|----------------|
| Test 1    | 00:49:48  | 00:29:54  | -39.96%        |
| Test 2    | 00:00:23  | 00:00:19  | -17.39%        |
| Test 3    | 00:49:08  | 00:25:04  | -48.98%        |
| Test 4    | 00:00:23  | 00:00:21  | -8.7%          |
| Test 5    | 00:49:18  | 00:26:30  | -46.25%        |
| Test 6    | 00:00:22  | 00:00:23  | 4.55%          |
| Test 7    | 00:44:46  | 00:21:20  | -52.35%        |
| Test 8    | 00:00:19  | 00:00:21  | 10.35%         |
| Test 9    | 00:49:34  | 00:25:33  | -48.45%        |
| Test 10   | 00:00:22  | 00:00:19  | -13.64%        |

Signed-off-by: André Martins <andre@cilium.io>
With the introduction of concurrent tests, it takes a little bit more
time for all endpoints to be regenerated. Thus we need to increase the
timeout from 5 minutes to 10 minutes.

Signed-off-by: André Martins <andre@cilium.io>
@aanm aanm force-pushed the pr/sync-upgrade-e2e-concurrency-ipsec branch from a2927f7 to a3ce44b Compare October 24, 2024 20:15
@aanm
Copy link
Member Author

aanm commented Oct 24, 2024

/test

@aanm aanm marked this pull request as ready for review October 24, 2024 20:19
@aanm aanm enabled auto-merge October 24, 2024 20:19
@aanm aanm added this pull request to the merge queue Oct 25, 2024
Merged via the queue into main with commit a7483f5 Oct 25, 2024
258 checks passed
@aanm aanm deleted the pr/sync-upgrade-e2e-concurrency-ipsec branch October 25, 2024 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake cilium-cli This PR contains changes related with cilium-cli release-note/ci This PR makes changes to the CI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants