fix(outputs): Retrigger batch-available-events correctly #17246

srebhan · 2025-06-26T16:53:23Z

Summary

The current code triggers a Write for an output as soon as a batch becomes available. This is done via the BatchReady channel and as a consequence the agent will call the write which then writes all available batches to quickly empty the buffer.

However, when adding metric during the time we write, the code becomes "racy" because we are resetting newMetricsCount after finishing the write which will neglect the fact that metrics were added during the write. As a consequence, you always need to add a complete batch after the write finishes (or wait for the flush interval) to trigger another write. This is an issue if metrics are added in bursts much larger than the batch size.

This PR removes the redundant book-keeping and relies on the buffer fullness to determine if a new batch is available. As such we use the buffer as a single place of truth. To avoid causing trigger storms we avoid retriggering a write if there is one in progress.

Checklist

No AI generated code was used in this PR

Related issues

resolves #17200

models/running_output.go

skartikey

@srebhan The architectural change from counter-based to buffer-based triggering is a good design improvement.

models/running_output.go

…g the output

telegraf-tiger · 2025-07-07T09:22:11Z

Download PR build artifacts for linux_amd64.tar.gz, darwin_arm64.tar.gz, and windows_amd64.zip.
Downloads for additional architectures and packages are available below.

☺️ This pull request doesn't significantly change the Telegraf binary size (less than 1%)

📦 Click here to get additional PR build artifacts

Artifact URLs

. DEB	. RPM	. TAR . GZ	. ZIP
[[amd64.deb aarch64.rpm darwin_amd64.tar.gz windows_amd64.zip] [arm64.deb armel.rpm darwin_arm64.tar.gz windows_arm64.zip] [armel.deb armv6hl.rpm freebsd_amd64.tar.gz windows_i386.zip] [armhf.deb i386.rpm freebsd_armv7.tar.gz ] [i386.deb ppc64le.rpm freebsd_i386.tar.gz ] [mips.deb riscv64.rpm linux_amd64.tar.gz ] [mipsel.deb s390x.rpm linux_arm64.tar.gz ] [ppc64el.deb x86_64.rpm linux_armel.tar.gz ] [riscv64.deb linux_armhf.tar.gz ] [s390x.deb linux_i386.tar.gz ] [ linux_mips.tar.gz ] [ linux_mipsel.tar.gz ] [ linux_ppc64le.tar.gz ] [ linux_riscv64.tar.gz ] [ linux_s390x.tar.gz ]]

lowjoel

Cool!

(cherry picked from commit 82d0f5c)

andrev10 · 2025-07-24T18:50:33Z

Warning, we had influxDb turned in for a output that wasn't connected and since now it doesn't wait for the write to finish was filling our logs and machine resources with metrics logs that couldn't find a host.

This fix might be a problem for when you are having errors in the service

lowjoel · 2025-07-25T00:00:02Z

@andrev10 you're right, I saw it in my machine too. Can you open another issue documenting this? Thanks!

fix(outputs): Retrigger batch-available-events correctly

225da31

telegraf-tiger bot added the fix pr to fix corresponding bug label Jun 26, 2025

srebhan mentioned this pull request Jun 26, 2025

Possible metrics buffer overflow when metrics are written faster than output flush completion #17200

Closed

srebhan self-assigned this Jun 26, 2025

srebhan added the plugin/output 1. Request for new output plugins 2. Issues/PRs that are related to out plugins label Jun 26, 2025

srebhan assigned skartikey and mstrandboge and unassigned srebhan Jul 1, 2025

lowjoel reviewed Jul 2, 2025

View reviewed changes

models/running_output.go Outdated Show resolved Hide resolved

mstrandboge approved these changes Jul 4, 2025

View reviewed changes

mstrandboge removed their assignment Jul 4, 2025

mstrandboge added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Jul 4, 2025

skartikey reviewed Jul 4, 2025

View reviewed changes

models/running_output.go Outdated Show resolved Hide resolved

srebhan added 3 commits July 7, 2025 10:35

chore: Remove debug output

b6db712

fix: Account for metrics dropped during write

db60381

fix: Only retrigger batch write if we are connected to avoid hammerin…

b6ac078

…g the output

srebhan requested review from lowjoel and skartikey July 7, 2025 08:52

lowjoel approved these changes Jul 7, 2025

View reviewed changes

skartikey approved these changes Jul 7, 2025

View reviewed changes

skartikey merged commit 82d0f5c into influxdata:master Jul 7, 2025
26 of 27 checks passed

github-actions bot added this to the v1.35.2 milestone Jul 7, 2025

srebhan added a commit that referenced this pull request Jul 7, 2025

fix(outputs): Retrigger batch-available-events correctly (#17246)

e80e7f8

(cherry picked from commit 82d0f5c)

BrewTestBot mentioned this pull request Jul 7, 2025

telegraf 1.35.2 Homebrew/homebrew-core#229310

Merged

andrev10 mentioned this pull request Jul 25, 2025

influxDb spamming retries #17378

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(outputs): Retrigger batch-available-events correctly #17246

fix(outputs): Retrigger batch-available-events correctly #17246

Uh oh!

srebhan commented Jun 26, 2025

Uh oh!

Uh oh!

skartikey left a comment

Uh oh!

Uh oh!

telegraf-tiger bot commented Jul 7, 2025

Artifact URLs

Uh oh!

lowjoel left a comment

Uh oh!

Uh oh!

andrev10 commented Jul 24, 2025

Uh oh!

lowjoel commented Jul 25, 2025

Uh oh!

Uh oh!

fix(outputs): Retrigger batch-available-events correctly #17246

fix(outputs): Retrigger batch-available-events correctly #17246

Uh oh!

Conversation

srebhan commented Jun 26, 2025

Summary

Checklist

Related issues

Uh oh!

Uh oh!

skartikey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

telegraf-tiger bot commented Jul 7, 2025

Artifact URLs

Uh oh!

lowjoel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andrev10 commented Jul 24, 2025

Uh oh!

lowjoel commented Jul 25, 2025

Uh oh!

Uh oh!