Skip to content

Add parallel streams throughput tests, and enable them in the EGW workflow #38027

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 11, 2025

Conversation

giorio94
Copy link
Member

@giorio94 giorio94 commented Mar 6, 2025

Extend the CLI performance tests to support measuring the throughput with multiple parallel streams, rather than a single one. This allows to bypass possible single flow limits due to resource exhaustion or cloud provider limits, such as [1]. Unfortunately, netperf does not support multiple flows out of the box, hence we need to be a bit creative to run multiple instances and parse the resulting output. Enable the new test as part of the egress gateway scale test, and change the node instance type to m5n.xlarge, as it supports up to 25Gbps burst bandwidth.

Please review commit by commit, and refer to the individual commit messages for additional details.

[1]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-network-bandwidth.html

@giorio94 giorio94 added area/CI Continuous Integration testing issue or flake release-note/ci This PR makes changes to the CI. feature/egress-gateway Impacts the egress IP gateway feature. labels Mar 6, 2025
@github-actions github-actions bot added the cilium-cli This PR contains changes related with cilium-cli label Mar 6, 2025
@giorio94
Copy link
Member Author

giorio94 commented Mar 6, 2025

/scale-egw

@giorio94 giorio94 force-pushed the pr/giorio94/main/scale-test-egw-throughput-multi branch from 944b9cf to b5ebc1d Compare March 6, 2025 10:32
@giorio94
Copy link
Member Author

giorio94 commented Mar 6, 2025

/scale-egw

@giorio94 giorio94 marked this pull request as ready for review March 6, 2025 13:29
@giorio94 giorio94 requested review from a team as code owners March 6, 2025 13:29
Let's not enable the egress gateway functionality at all in the baseline
test, rather than enabling it but not configuring any policy. This
ensures that baseline Cilium CPU/memory usage are not influenced by any
Egress Gateway related tasks.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 force-pushed the pr/giorio94/main/scale-test-egw-throughput-multi branch from b5ebc1d to e8e36e8 Compare March 6, 2025 17:50
@giorio94
Copy link
Member Author

giorio94 commented Mar 6, 2025

Rebased onto main to drop the commit already merged via #37981

@giorio94
Copy link
Member Author

giorio94 commented Mar 6, 2025

/scale-egw

Extend the CLI performance tests to support measuring the throughput
with multiple parallel streams, rather than a single one. This allows
to bypass possible single flow limits due to resource exhaustion or
cloud provider limits, such as [1]. Unfortunately, netperf does not
support multiple flows out of the box, hence we need to be a bit
creative to run multiple instances and parse the resulting output.

[1]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-network-bandwidth.html

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 force-pushed the pr/giorio94/main/scale-test-egw-throughput-multi branch from e8e36e8 to a4c8dfa Compare March 7, 2025 14:19
giorio94 added 3 commits March 7, 2025 15:29
Following the extension of this logic to additionally support multiple
parallel streams, let's add a unit test exercising the different paths,
for increased confidence in its correctness, and to prevent future
regressions.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
So that we circumvent the single flow limitations imposed by AWS [1].
Additionally, let's change the node instance type to m5n.xlarge, as
it supports up to 25Gbps burst bandwidth.

[1]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-network-bandwidth.html

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Gather the amount of traffic forwarded by the gateway node during the
network performance test, and assert that it is close to 0 in the
baseline test, and way higher in the EGW test. This serves as a sanity
check that traffic is indeed flowing through the egress gateway when
expected.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 force-pushed the pr/giorio94/main/scale-test-egw-throughput-multi branch from a4c8dfa to f46aeb9 Compare March 7, 2025 14:29
@dlapcevic dlapcevic removed their request for review March 7, 2025 15:23
@giorio94
Copy link
Member Author

giorio94 commented Mar 7, 2025

/test

Copy link
Contributor

@derailed derailed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@giorio94 Nice! Do you have a sense of the before/after throughput gain

@giorio94
Copy link
Member Author

giorio94 commented Mar 7, 2025

@giorio94 Nice! Do you have a sense of the before/after throughput gain

Single-stream flows are limited to 5 Gbps on AWS [1]. The multi-flow tests instead saturate the 25 Gb/s available bandwidth on these instances (except UDP + egress gateway, which tops at about 15 Gb/s due to CPU saturation).

[1]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-network-bandwidth.html

@giorio94 giorio94 enabled auto-merge March 11, 2025 08:18
@giorio94 giorio94 added this pull request to the merge queue Mar 11, 2025
Merged via the queue into main with commit 01ab5e8 Mar 11, 2025
287 checks passed
@giorio94 giorio94 deleted the pr/giorio94/main/scale-test-egw-throughput-multi branch March 11, 2025 09:09
@maintainer-s-little-helper maintainer-s-little-helper bot added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Mar 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake cilium-cli This PR contains changes related with cilium-cli feature/egress-gateway Impacts the egress IP gateway feature. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants