`[inputs.internal] collection taking longer than 15s` whilst outputs seem to stop outputting data

### Relevant telegraf.conf

```toml
[agent]
  collection_jitter = "3s"
  debug = true
  flush_interval = "15s"
  flush_jitter = "0s"
  hostname = "$HOSTNAME"
  interval = "15s"
  logfile = ""
  metric_batch_size = 15000
  metric_buffer_limit = 100000
  omit_hostname = false
  precision = ""
  quiet = false
  round_interval = true
[[processors.converter]]
  namepass = [
    "foobar_duration_ms"
  ]
  [processors.converter.tags]
    integer = [
        "duration"
    ]
[[aggregators.histogram]]
  drop_original = true
  grace = "120s"
  namepass = [
    "foobar_duration_ms"
  ]
  period = "30s"
  [[aggregators.histogram.config]]
    buckets = [50.0,100.0,250.0,500.0,750.0,1000.0,2000.0,5000.0,10000.0,25000.0
    ]
    fields = [
              "duration"
    ]
    measurement_name = "foobar_duration_ms"
[[outputs.prometheus_client]]
  collectors_exclude = [
    "gocollector",
    "process"
  ]
  listen = ":9273"
[[inputs.opentelemetry]]
[[inputs.internal]]
  collect_memstats = true
```


### Logs from Telegraf

Logs are normally just a mixture of `D! [aggregators.histogram] Updated aggregation range...` or `D! [outputs.prometheus_client] Buffer fullness: 2380 / 100000 metrics`.

When telegraf silently crashes in this case, there is a spike in the number of logs saying both
`D! [aggregators.histogram] Metric is outside aggregation window; discarding...` and
```
W! [inputs.internal] Collection took longer than expected; not complete after interval of 15s
D! [inputs.internal] Previous collection has not completed; scheduled collection skipped
```


### System info

telegraf:1.32-alpine, K8s 1.29.9

### Steps to reproduce

Sadly I've tried to load test it locally, but I am yet to successfully reproduce this somewhere that isn't production. I will add detail once I reliably can.

### Expected behavior

telegraf continues to aggregate and output metrics and if there is a problem, the process exits allowing it be automatically restarted if the user wishes.

### Actual behavior

It "hangs" without restarting after ~12h of ingestion. It indicates that it cannot gather data from the `internal` input but doesn't log that it isn't able to gather data from the `opentelemetry` input, even though it stops outputting data entirely. It continues to utilise around the same amount of memory and CPU even though it apparently isn't gathering/process/aggregating/outputting any data.

The "fix" for this is to restart telegraf. Some graphs to help illustrate the behaviour:

<img width="1536" alt="image" src="https://github.com/user-attachments/assets/c94eb48f-8bd7-45b7-b9d8-1a453bc12a78">

Something I've noticed is that the the rate of metrics written keeps increasing which I'm guessing is a function of cardinality, and if telegraf has yet "seen" all the combinations of values for the labels of a given metric. As the output isn't configured to expire metrics, there is no way for this number to ever decrease without restarting the process. Maybe the cardinality is too high here, but I'm not sure how best to measure if it is. I would expect CPU and memory to approach the provisioned limit perhaps, but so far I haven't seen this.

### Additional info

I initially raised this in the #telegraf Slack channel in the InfluxDB workspace and was directed to raising a bug report. Thread [link](https://influxcommunity.slack.com/archives/CH99HUH8V/p1729756955513889?thread_ts=1729753547.474919&cid=CH99HUH8V).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`[inputs.internal] collection taking longer than 15s` whilst outputs seem to stop outputting data #16070

Relevant telegraf.conf

Logs from Telegraf

System info

Steps to reproduce

Expected behavior

Actual behavior

Additional info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[inputs.internal] collection taking longer than 15s whilst outputs seem to stop outputting data #16070

Description

Relevant telegraf.conf

Logs from Telegraf

System info

Steps to reproduce

Expected behavior

Actual behavior

Additional info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`[inputs.internal] collection taking longer than 15s` whilst outputs seem to stop outputting data #16070