-
Notifications
You must be signed in to change notification settings - Fork 5.7k
fix(outputs): Retrigger batch-available-events correctly #17246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srebhan The architectural change from counter-based to buffer-based triggering is a good design improvement.
Download PR build artifacts for linux_amd64.tar.gz, darwin_arm64.tar.gz, and windows_amd64.zip. 📦 Click here to get additional PR build artifactsArtifact URLs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
(cherry picked from commit 82d0f5c)
Warning, we had influxDb turned in for a output that wasn't connected and since now it doesn't wait for the write to finish was filling our logs and machine resources with metrics logs that couldn't find a host. This fix might be a problem for when you are having errors in the service |
@andrev10 you're right, I saw it in my machine too. Can you open another issue documenting this? Thanks! |
Summary
The current code triggers a
Write
for an output as soon as a batch becomes available. This is done via theBatchReady
channel and as a consequence the agent will call the write which then writes all available batches to quickly empty the buffer.However, when adding metric during the time we write, the code becomes "racy" because we are resetting
newMetricsCount
after finishing the write which will neglect the fact that metrics were added during the write. As a consequence, you always need to add a complete batch after the write finishes (or wait for the flush interval) to trigger another write. This is an issue if metrics are added in bursts much larger than the batch size.This PR removes the redundant book-keeping and relies on the buffer fullness to determine if a new batch is available. As such we use the buffer as a single place of truth. To avoid causing trigger storms we avoid retriggering a write if there is one in progress.
Checklist
Related issues
resolves #17200