Skip to content

fix(outputs): Retrigger batch-available-events correctly #17246

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 7, 2025

Conversation

srebhan
Copy link
Member

@srebhan srebhan commented Jun 26, 2025

Summary

The current code triggers a Write for an output as soon as a batch becomes available. This is done via the BatchReady channel and as a consequence the agent will call the write which then writes all available batches to quickly empty the buffer.

However, when adding metric during the time we write, the code becomes "racy" because we are resetting newMetricsCount after finishing the write which will neglect the fact that metrics were added during the write. As a consequence, you always need to add a complete batch after the write finishes (or wait for the flush interval) to trigger another write. This is an issue if metrics are added in bursts much larger than the batch size.

This PR removes the redundant book-keeping and relies on the buffer fullness to determine if a new batch is available. As such we use the buffer as a single place of truth. To avoid causing trigger storms we avoid retriggering a write if there is one in progress.

Checklist

  • No AI generated code was used in this PR

Related issues

resolves #17200

@telegraf-tiger telegraf-tiger bot added the fix pr to fix corresponding bug label Jun 26, 2025
@srebhan srebhan self-assigned this Jun 26, 2025
@srebhan srebhan added the plugin/output 1. Request for new output plugins 2. Issues/PRs that are related to out plugins label Jun 26, 2025
@srebhan srebhan assigned skartikey and mstrandboge and unassigned srebhan Jul 1, 2025
@mstrandboge mstrandboge removed their assignment Jul 4, 2025
@mstrandboge mstrandboge added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Jul 4, 2025
Copy link
Contributor

@skartikey skartikey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srebhan The architectural change from counter-based to buffer-based triggering is a good design improvement.

@srebhan srebhan requested review from lowjoel and skartikey July 7, 2025 08:52
@telegraf-tiger
Copy link
Contributor

telegraf-tiger bot commented Jul 7, 2025

Download PR build artifacts for linux_amd64.tar.gz, darwin_arm64.tar.gz, and windows_amd64.zip.
Downloads for additional architectures and packages are available below.

☺️ This pull request doesn't significantly change the Telegraf binary size (less than 1%)

📦 Click here to get additional PR build artifacts

Artifact URLs

. DEB . RPM . TAR . GZ . ZIP
[[amd64.deb aarch64.rpm darwin_amd64.tar.gz windows_amd64.zip] [arm64.deb armel.rpm darwin_arm64.tar.gz windows_arm64.zip] [armel.deb armv6hl.rpm freebsd_amd64.tar.gz windows_i386.zip] [armhf.deb i386.rpm freebsd_armv7.tar.gz ] [i386.deb ppc64le.rpm freebsd_i386.tar.gz ] [mips.deb riscv64.rpm linux_amd64.tar.gz ] [mipsel.deb s390x.rpm linux_arm64.tar.gz ] [ppc64el.deb x86_64.rpm linux_armel.tar.gz ] [riscv64.deb linux_armhf.tar.gz ] [s390x.deb linux_i386.tar.gz ] [ linux_mips.tar.gz ] [ linux_mipsel.tar.gz ] [ linux_ppc64le.tar.gz ] [ linux_riscv64.tar.gz ] [ linux_s390x.tar.gz ]]

Copy link

@lowjoel lowjoel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

@skartikey skartikey merged commit 82d0f5c into influxdata:master Jul 7, 2025
26 of 27 checks passed
@github-actions github-actions bot added this to the v1.35.2 milestone Jul 7, 2025
srebhan added a commit that referenced this pull request Jul 7, 2025
@andrev10
Copy link

Warning, we had influxDb turned in for a output that wasn't connected and since now it doesn't wait for the write to finish was filling our logs and machine resources with metrics logs that couldn't find a host.

This fix might be a problem for when you are having errors in the service

@lowjoel
Copy link

lowjoel commented Jul 25, 2025

@andrev10 you're right, I saw it in my machine too. Can you open another issue documenting this? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix pr to fix corresponding bug plugin/output 1. Request for new output plugins 2. Issues/PRs that are related to out plugins ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible metrics buffer overflow when metrics are written faster than output flush completion
5 participants