Skip to content

Telegraf PANIC when Cloudwatch is configured with an Activities Include, but not an Exclude #16779

@alexeiser

Description

@alexeiser

Relevant telegraf.conf

[[inputs.cloudwatch]]
  region = "us-west-2"
  period = "1m"
  delay = "5m"
  interval = "1m"
  cache_ttl = "1h"
  namespace = "AWS/ELB"
  ratelimit = 10

  [[inputs.cloudwatch.metrics]]
    names = ["UnHealthyHostCount", "HealthyHostCount"]
    statistic_include = ["average"]
#    statistic_exclude = [] # required to avoid panic in telegraf 1.34

    [[inputs.cloudwatch.metrics.dimensions]]
      name = "LoadBalancerName"
      value = "*"

[[outputs.file]]

Logs from Telegraf

docker run --rm  -e AWS_ACCESS_KEY_ID -e AWS_SESSION_TOKEN -e AWS_SECRET_ACCESS_KEY  -v $PWD/nuna_telegraf.conf:/etc/telegraf/telegraf.conf:ro telegraf
2025-04-12T20:03:47Z I! Loading config: /etc/telegraf/telegraf.conf
2025-04-12T20:03:47Z W! DeprecationWarning: Option "namespace" of plugin "inputs.cloudwatch" deprecated since version 1.25.0 and will be removed in 1.35.0: use 'namespaces' instead
2025-04-12T20:03:47Z I! Starting Telegraf 1.34.1 brought to you by InfluxData the makers of InfluxDB
2025-04-12T20:03:47Z I! Available plugins: 239 inputs, 9 aggregators, 33 processors, 26 parsers, 63 outputs, 6 secret-stores
2025-04-12T20:03:47Z I! Loaded inputs: cloudwatch
2025-04-12T20:03:47Z I! Loaded aggregators:
2025-04-12T20:03:47Z I! Loaded processors:
2025-04-12T20:03:47Z I! Loaded secretstores:
2025-04-12T20:03:47Z I! Loaded outputs: file
2025-04-12T20:03:47Z I! Tags enabled: host=29b1102e93a9
2025-04-12T20:03:47Z W! Deprecated inputs: 0 and 1 options
2025-04-12T20:03:47Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"29b1102e93a9", Flush Interval:10s
2025-04-12T20:03:47Z W! [agent] The default value of 'skip_processors_after_aggregators' will change to 'true' with Telegraf v1.40.0! If you need the current default behavior, please explicitly set the option to 'false'!
2025-04-12T20:04:06Z E! FATAL: [inputs.cloudwatch] panicked: runtime error: invalid memory address or nil pointer dereference, Stack:
goroutine 31 [running]:
github.com/influxdata/telegraf/agent.panicRecover(0x40003a9800)
	/go/src/github.com/influxdata/telegraf/agent/agent.go:1202 +0x60
panic({0x7bb5520?, 0x100cdc50?})
	/usr/local/go/src/runtime/panic.go:792 +0x124
github.com/influxdata/telegraf/plugins/inputs/cloudwatch.(*CloudWatch).getFilteredMetrics(0x4000d35000)
	/go/src/github.com/influxdata/telegraf/plugins/inputs/cloudwatch/cloudwatch.go:303 +0x594
github.com/influxdata/telegraf/plugins/inputs/cloudwatch.(*CloudWatch).Gather(0x4000d35000, {0x9ec2ce0, 0x4000c94260})
	/go/src/github.com/influxdata/telegraf/plugins/inputs/cloudwatch/cloudwatch.go:176 +0x3c
github.com/influxdata/telegraf/models.(*RunningInput).Gather(0x40003a9800, {0x9ec2ce0, 0x4000c94260})
	/go/src/github.com/influxdata/telegraf/models/running_input.go:260 +0x23c
github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1()
	/go/src/github.com/influxdata/telegraf/agent/agent.go:590 +0x58
created by github.com/influxdata/telegraf/agent.(*Agent).gatherOnce in goroutine 50
	/go/src/github.com/influxdata/telegraf/agent/agent.go:588 +0xc0

goroutine 1 [sync.WaitGroup.Wait]:
sync.runtime_SemacquireWaitGroup(0x40000021c0?)
	/usr/local/go/src/runtime/sema.go:110 +0x2c
sync.(*WaitGroup).Wait(0x4000d40070)
	/usr/local/go/src/sync/waitgroup.go:118 +0x70
github.com/influxdata/telegraf/agent.(*Agent).Run(0x40001a20d8, {0x9e773b0, 0x4000e0d0e0})
	/go/src/github.com/influxdata/telegraf/agent/agent.go:208 +0x880
main.(*Telegraf).runAgent(0x4000f126e0, {0x9e773b0, 0x4000e0d0e0}, 0x0?)
	/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:486 +0x12dc
main.(*Telegraf).reloadLoop(0x4000f126e0)
	/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:206 +0x1e4
main.(*Telegraf).Run(0x4000f126e0)
	/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf_posix.go:20 +0xbc
main.runApp.func1(0x400046bd00)
	/go/src/github.com/influxdata/telegraf/cmd/telegraf/main.go:256 +0x978
github.com/urfave/cli/v2.(*Command).Run(0x4000f14160, 0x400046bd00, {0x40001ca080, 0x1, 0x1})
	/g
2025-04-12T20:04:06Z E! PLEASE REPORT THIS PANIC ON GITHUB with stack trace, configuration, and OS information: https://github.com/influxdata/telegraf/issues/new/choose
2025-04-12T13:04:06-07:00 ERR Executed command returned error: exit status 1 component=exec service=aws version=UNSET

System info

Telegraf 1.34.1

Docker

None - using influx docker as an example - also occurs on standard os telegraf installs

Steps to reproduce

  1. Include a cloudwatch input that has a customzied statistic_include or a statistic_exclude - but not both
  2. Wait 30-60 seconds

Expected behavior

telegraf should use the default statistic_exclude (empty list) - or throw a warning on an invalid config - and not have a null pointer exception

Actual behavior

Telegraf crashes

Additional info

Caused by #16337

Metadata

Metadata

Assignees

Labels

bugunexpected problem or unintended behavior

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions