[CI] Add metrics summary action #7376

pipiland2612 · 2025-07-26T08:15:04Z

Which problem is this PR solving?

Resolves task 4 of Implement framework to validate backwards compatibility of metrics #6278

Description of the changes

Add python script to generate metrics_summary
Add action to Generate metrics summary

How was this change tested?

Checklist

I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
I have signed all commits
I have added unit tests for the new functionality
I have run lint and test steps successfully
- for jaeger: make lint test
- for jaeger-ui: npm run lint and npm run test

codecov · 2025-07-26T08:22:28Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.46%. Comparing base (2bbb1e4) to head (68451d0).
⚠️ Report is 17 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #7376       +/-   ##
===========================================
+ Coverage   20.65%   96.46%   +75.81%     
===========================================
  Files         201      375      +174     
  Lines       12834    22958    +10124     
===========================================
+ Hits         2651    22147    +19496     
+ Misses       9995      613     -9382     
- Partials      188      198       +10

Flag	Coverage Δ
badger_v1	`9.06% <ø> (-0.06%)`	⬇️
badger_v2	`1.71% <ø> (-0.02%)`	⬇️
cassandra-4.x-v1-manual	`11.76% <ø> (?)`
cassandra-4.x-v2-auto	`1.70% <ø> (?)`
cassandra-4.x-v2-manual	`1.70% <ø> (?)`
cassandra-5.x-v1-manual	`11.76% <ø> (?)`
cassandra-5.x-v2-auto	`1.70% <ø> (?)`
cassandra-5.x-v2-manual	`1.70% <ø> (?)`
elasticsearch-6.x-v1	`16.71% <ø> (?)`
elasticsearch-7.x-v1	`16.75% <ø> (?)`
elasticsearch-8.x-v1	`16.89% <ø> (?)`
elasticsearch-8.x-v2	`1.80% <ø> (?)`
elasticsearch-9.x-v2	`1.71% <ø> (?)`
grpc_v1	`10.29% <ø> (-0.07%)`	⬇️
grpc_v2	`1.71% <ø> (-0.02%)`	⬇️
kafka-3.x-v1	`9.74% <ø> (+0.46%)`	⬆️
kafka-3.x-v2	`1.71% <ø> (?)`
memory_v2	`1.71% <ø> (?)`
opensearch-1.x-v1	`16.80% <ø> (?)`
opensearch-2.x-v1	`16.80% <ø> (?)`
opensearch-2.x-v2	`1.71% <ø> (?)`
opensearch-3.x-v2	`1.71% <ø> (?)`
query	`1.71% <ø> (-0.02%)`	⬇️
tailsampling-processor	`0.47% <ø> (?)`
unittests	`95.44% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

.github/actions/verify-metrics-snapshot/action.yaml

scripts/e2e/compare_metrics.py

pipiland2612 · 2025-07-26T10:01:02Z

Hi @yurishkuro, do you have any idea to test if this action actually works ?

scripts/e2e/metrics_summary.py

scripts/e2e/compare_metrics.py

.github/actions/verify-metrics-snapshot/action.yaml

scripts/e2e/metrics_summary.py

.github/actions/verify-metrics-snapshot/action.yaml

pipiland2612 · 2025-07-26T19:11:08Z

I don't know why it's not reporting anything, still debugging

.github/actions/verify-metrics-snapshot/action.yaml

pipiland2612 · 2025-07-26T20:16:38Z

Let's wait to see if it actually reporting

internal/storage/v1/api/spanstore/spanstoremetrics/read_metrics.go

pipiland2612 · 2025-07-26T20:36:30Z

Let's wait to see if it actually reporting

Ok, nothing is happening :-)

.github/workflows/ci-e2e-all.yml

yurishkuro · 2025-07-26T22:17:52Z

The run is here https://github.com/jaegertracing/jaeger/actions/runs/16543493108/job/46788110217?pr=7376

Doesn't say if any commands were executed - if they were in a script with set -ex then we would see execution logs

.github/workflows/ci-e2e-all.yml

scripts/e2e/metrics_summary.sh

internal/storage/v1/api/spanstore/spanstoremetrics/read_metrics.go

scripts/e2e/metrics_summary.sh

.github/workflows/ci-e2e-all.yml

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

pipiland2612 · 2025-07-27T19:08:09Z

Why do you try to comment on issue and not on PR?

There are some popular gh actions for posting comments and I recall seeing a notice there that for forks the permissions are insufficient (since our main gh token is not accessible when running ci on a pr from a fork - the pr can use that to steal the secrets). Did you research the proper way for of doing that?

Indeed let me try researching it.

pipiland2612 · 2025-07-27T19:14:56Z

Ah, I see the current workflow is indeed a pull request

on:
  merge_group:
  push:
    branches: [main]

  pull_request:
    branches: [main]

I think we would need another separate workflow to upload the comment

yurishkuro · 2025-07-27T19:25:22Z

Some thougts

https://github.com/thollander/actions-comment-pull-request is designed to comment on pull requests, including using the "upsert" behavior. It probably makes your JS script unnecessary
It also warns about not having token permissions when running on a fork and recommends using pull_request_target even instead of pull_request. However, it is unsafe to run code from PR branch in pull_request_target mode, so if we go this route we will need to merge your action to main branch first.
On a related note, I think it would simplify the whole setup if we didn't have the diffing / summary logic spread out across workflows. I can imagine the following setup:
- The code from PR runs e2e tests and only uploads the metrics files artifacts
- Then a separate action via pull_request_target event runs with primary repo context (token with permissions to post). This action no longer needs access to code from the PR, it only needs to download the metrics artifacts, also download same ones from the main branch, do the diffs, compose a report, and post a comment.
- Such setup will allow to consolidate most of the diffing / reporting code in one place (e2e workflows only scrape metrics and upload them), and it will make the diffing / reporting code much easier to test locally since all it needs it two directories with main & PR metric dumps.

BTW, I appreciate you working on this, it's one of those gnarly issues that are not easy to get right but pay off in the long run as a protection against bugs.

pipiland2612 · 2025-07-27T19:32:24Z

On a related note, I think it would simplify the whole setup if we didn't have the diffing / summary logic spread out across workflows. I can imagine the following setup:

The code from PR runs e2e tests and only uploads the metrics files artifacts

Then a separate action via pull_request_target event runs with primary repo context (token with permissions to post). This action no longer needs access to code from the PR, it only needs to download the metrics artifacts, also download same ones from the main branch, do the diffs, compose a report, and post a comment.

Such setup will allow to consolidate most of the diffing / reporting code in one place (e2e workflows only scrape metrics and upload them), and it will make the diffing / reporting code much easier to test locally since all it needs it two directories with main & PR metric dumps.

Yup, I agree with this, the current setup is actually really complex and not intuitive, it will be hard to maintain and develop

pipiland2612 · 2025-07-28T07:08:04Z

Hi @yurishkuro, I'd propose that we should keep the current work: calculate the diff for every single e2e test, because this is much more easier (it is related to id and other thing as well). Then we can aggregate download all the diff artifact together and make a summary, post a PR comment.

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

pipiland2612 · 2025-07-28T07:44:21Z

Why is my newly-added workflow is not running ?

yurishkuro · 2025-07-28T13:15:11Z

It's not running for two reasons:

as it's trigger is workflow-run you need to actually call it explicitly from all-e2e
Even if you call it now the code sync in that workflow is from main where it does not exist yet.

pipiland2612 · 2025-07-28T16:50:05Z

It's not running for two reasons:

as it's trigger is workflow-run you need to actually call it explicitly from all-e2e

Even if you call it now the code sync in that workflow is from main where it does not exist yet.

Do you have any idea to test it for now ?

yurishkuro · 2025-07-29T03:33:34Z

Do you have any idea to test it for now ?

We have two functions we want to test, one is the processing of the metrics results and generation of a report, and the other is posting that report as a comment. Only the second one requires token permissions. So we can merge the PR with just the first part, still happening in the regular workflows without PR comments. Then you can separately test a comment-posting workflow. And finally we'd add them together.

pipiland2612 · 2025-07-30T13:13:09Z

Do you have any idea to test it for now ?

We have two functions we want to test, one is the processing of the metrics results and generation of a report, and the other is posting that report as a comment. Only the second one requires token permissions. So we can merge the PR with just the first part, still happening in the regular workflows without PR comments. Then you can separately test a comment-posting workflow. And finally we'd add them together.

Hi @yurishkuro, so I think this pull request will solve the first part, but how can we test the first part actually ?

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

pipiland2612 · 2025-07-31T16:58:45Z

Hi @yurishkuro, can you review this ?

Signed-off-by: Yuri Shkuro <github@ysh.us>

graphite-app · 2025-08-01T01:07:02Z

.github/actions/verify-metrics-snapshot/action.yaml

+        if python3 ./scripts/e2e/compare_metrics.py --file1 ./.metrics/${{ inputs.snapshot }}.txt --file2 ./.metrics/baseline_${{ inputs.snapshot }}.txt --output ./.metrics/diff_${{ inputs.snapshot }}.txt; then
+          echo "No differences found in metrics"
+        else
+          echo "🛑 Differences found in metrics"
+          echo "has_diff=true" >> $GITHUB_OUTPUT


The error handling logic has been changed in this PR. Previously, the script would exit with code 1 when differences were found, and the workflow would fail. Now, the script returns exit code 0 for no differences and exit code 1 for differences found, with the has_diff output variable being set only in the latter case.

This approach works for the current use case, but consider handling other potential failure modes. If the script fails for reasons other than finding differences (e.g., file not found, permission issues), the has_diff output won't be set, but the step will still succeed due to continue-on-error: true. This could mask actual errors in the comparison process.

Consider adding more robust error handling to distinguish between "differences found" and "script execution failed for other reasons".

Suggested change

if python3 ./scripts/e2e/compare_metrics.py --file1 ./.metrics/${{ inputs.snapshot }}.txt --file2 ./.metrics/baseline_${{ inputs.snapshot }}.txt --output ./.metrics/diff_${{ inputs.snapshot }}.txt; then

echo "No differences found in metrics"

else

echo "🛑 Differences found in metrics"

echo "has_diff=true" >> $GITHUB_OUTPUT

exit_code=0

python3 ./scripts/e2e/compare_metrics.py --file1 ./.metrics/${{ inputs.snapshot }}.txt --file2 ./.metrics/baseline_${{ inputs.snapshot }}.txt --output ./.metrics/diff_${{ inputs.snapshot }}.txt || exit_code=$?

if [ $exit_code -eq 0 ]; then

echo "No differences found in metrics"

elif [ $exit_code -eq 1 ]; then

echo "🛑 Differences found in metrics"

echo "has_diff=true" >> $GITHUB_OUTPUT

else

echo "❌ Error executing comparison script (exit code: $exit_code)"

echo "has_diff=true" >> $GITHUB_OUTPUT

exit $exit_code

fi

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

pipiland2612 requested a review from a team as a code owner July 26, 2025 08:15

pipiland2612 requested a review from albertteoh July 26, 2025 08:15

graphite-app bot reviewed Jul 26, 2025

View reviewed changes

.github/actions/verify-metrics-snapshot/action.yaml Outdated Show resolved Hide resolved

pipiland2612 commented Jul 26, 2025

View reviewed changes

scripts/e2e/compare_metrics.py Outdated Show resolved Hide resolved

graphite-app bot reviewed Jul 26, 2025

View reviewed changes

scripts/e2e/metrics_summary.py Outdated Show resolved Hide resolved