-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Describe the bug
The Stackdriver registry overwrites existing metric descriptors when it fails to pre-populate the known metrics during initialization due to a timeout, causing loss of Terraform-configured data like user-friendly names and descriptions.
This behavior causes Terraform to attempt recreation of the descriptor via delete and re-create, which can time-out and fail deployments. (related bug ticket hashicorp/terraform-provider-google#22949).
Environment
- Micrometer version: 1.14.6
- Micrometer registry: Stackdriver / Google Cloud Monitoring
- OS: Linux
- Java version: 21
To Reproduce
Configure a project that reports using Stackdriver
Create e.g. a gauge and set a value.
Eventually when publishing the value, createMetricDescriptorIfNecessary
will be called. Because this is the first call, verifiedDescriptors
is empty, and prePopulateVerifiedDescriptors
will be called.
prePopulateVerifiedDescriptors
attempts to fetch existing descriptors via the API client here. This call needs to fail to trigger the bug (this happens for us sometimes due to a network timeout). The failed call is explicitly ignored by the code here, and the verifiedDescriptors
list remains empty.
The calling createMetricDescriptorIfNecessary
will now not find the metric descriptor in the still-empty verifiedDescriptors
list, causing a call to create the metric despite it already existing, overwriting it with the incomplete metric information present in the service, which lacks e.g. the friendly name and description.
Expected behavior
Existing metric descriptors do not get overwritten.
If the original author's intent to have failures in prePopulateVerifiedDescriptors
not be showstoppers should be preserved, then I believe a good resolution would be to make the auto-creation of metric descriptors optional via configuration, and if disabled to just attempt to write the time series.
Additional context
Log excerpt (slightly anonymized):
{
"insertId": "ID",
"jsonPayload": {
"logger": "io.micrometer.stackdriver.StackdriverMeterRegistry",
"context": "SERVICE NAME",
"message": "Failed to pre populate verified descriptors for <GCP PROJECT>\ncom.google.api.gax.rpc.DeadlineExceededException: io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: Deadline CallOptions will be exceeded in 29.999807724s. \n\tat com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:94)\n\tat com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:41)\n\tat com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:86)\n\tat com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:66)\n\tat com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97)\n\tat com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:84)\n\tat com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1125)\n\tat com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)\n\tat com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1004)\n\tat com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:767)\n\tat com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:516)\n\tat io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:651)\n\tat io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:621)\n\tat io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)\n\tat io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)\n\tat io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)\n\tat com.google.api.gax.grpc.ChannelPool$ReleasingClientCall$1.onClose(ChannelPool.java:569)\n\tat io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)\n\tat io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)\n\tat io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)\n\tat com.google.api.gax.grpc.GrpcLoggingInterceptor$1$1.onClose(GrpcLoggingInterceptor.java:98)\n\tat io.grpc.internal.DelayedClientCall$CloseListenerRunnable.runInContext(DelayedClientCall.java:432)\n\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)\n\tat java.base/java.lang.Thread.run(Unknown Source)\n\tSuppressed: com.google.api.gax.rpc.AsyncTaskException: Asynchronous task failed\n\t\tat com.google.api.gax.rpc.ApiExceptions.callAndTranslateApiException(ApiExceptions.java:57)\n\t\tat com.google.api.gax.rpc.UnaryCallable.call(UnaryCallable.java:112)\n\t\tat com.google.cloud.monitoring.v3.MetricServiceClient.listMetricDescriptors(MetricServiceClient.java:912)\n\t\tat io.micrometer.stackdriver.StackdriverMeterRegistry$Batch.prePopulateVerifiedDescriptors(StackdriverMeterRegistry.java:473)\n\t\tat io.micrometer.stackdriver.StackdriverMeterRegistry$Batch.createMetricDescriptorIfNecessary(StackdriverMeterRegistry.java:427)\n\t\tat io.micrometer.stackdriver.StackdriverMeterRegistry$Batch.createTimeSeries(StackdriverMeterRegistry.java:402)\n\t\tat io.micrometer.stackdriver.StackdriverMeterRegistry$Batch.createTimeSeries(StackdriverMeterRegistry.java:368)\n\t\tat io.micrometer.stackdriver.StackdriverMeterRegistry.createCounter(StackdriverMeterRegistry.java:276)\n\t\tat io.micrometer.stackdriver.StackdriverMeterRegistry.lambda$publish$2(StackdriverMeterRegistry.java:202)\n\t\tat io.micrometer.core.instrument.Meter.match(Meter.java:104)\n\t\tat io.micrometer.stackdriver.StackdriverMeterRegistry.lambda$publish$10(StackdriverMeterRegistry.java:202)\n\t\tat java.base/java.util.stream.ReferencePipeline$7$1.accept(Unknown Source)\n\t\tat java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Unknown Source)\n\t\tat java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source)\n\t\tat java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source)\n\t\tat java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(Unknown Source)\n\t\tat java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source)\n\t\tat java.base/java.util.stream.ReferencePipeline.collect(Unknown Source)\n\t\tat io.micrometer.stackdriver.StackdriverMeterRegistry.publish(StackdriverMeterRegistry.java:207)\n\t\tat io.micrometer.core.instrument.push.PushMeterRegistry.publishSafelyOrSkipIfInProgress(PushMeterRegistry.java:64)\n\t\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)\n\t\tat java.base/java.util.concurrent.FutureTask.runAndReset(Unknown Source)\n\t\tat java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)\n\t\t... 3 common frames omitted\nCaused by: io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: Deadline CallOptions will be exceeded in 29.999807724s. \n\tat io.grpc.Status.asRuntimeException(Status.java:532)\n\t... 14 common frames omitted\n",
"thread": "stackdriver-metrics-publisher"
},
"resource": {
"type": "cloud_run_revision",
"labels": {
"service_name": "SERVICE_NAME",
"revision_name": "SERVICE_NAME-REVISION",
"project_id": "PROJECT_ID",
"location": "europe-west3",
"configuration_name": "SERVICE_NAME"
}
},
"timestamp": "2025-05-21T03:54:20.991929322Z",
"severity": "WARNING",
"labels": {
"instanceId": "ID"
},
"logName": "projects/PROJECT_ID/logs/run.googleapis.com%2Fstdout",
"receiveTimestamp": "2025-05-21T03:54:21.211065676Z"
}
[...] Other errors ommitted
{
"protoPayload": {
"@type": "type.googleapis.com/google.cloud.audit.AuditLog",
"authenticationInfo": {
"principalEmail": "SERVICE_NAME-runner@PROJECT_ID.iam.gserviceaccount.com",
"serviceAccountDelegationInfo": [
{
"firstPartyPrincipal": {
"principalEmail": "service-XXXXXXXXXXX@serverless-robot-prod.iam.gserviceaccount.com"
}
}
],
"principalSubject": "serviceAccount:SERVICE-runner@PROJECT_ID.iam.gserviceaccount.com"
},
"requestMetadata": {
"callerIp": "10.5.3.10",
"callerSuppliedUserAgent": "Spring/6.2.0 spring-cloud-gcp-metrics/6.2.0 grpc-java-netty/1.70.0,gzip(gfe)",
"callerNetwork": "//compute.googleapis.com/projects/PROJECT_ID/global/networks/__unknown__",
"requestAttributes": {
"time": "2025-05-21T03:54:27.605392688Z",
"auth": {}
},
"destinationAttributes": {}
},
"serviceName": "monitoring.googleapis.com",
"methodName": "google.monitoring.v3.MetricService.CreateMetricDescriptor",
"authorizationInfo": [
{
"resource": "projects/PROJECT_ID",
"permission": "monitoring.metricDescriptors.create",
"granted": true,
"resourceAttributes": {},
"permissionType": "ADMIN_WRITE"
}
],
"resourceName": "projects/PROJECT_ID",
"request": {
"name": "projects/PROJECT_ID",
"@type": "type.googleapis.com/google.monitoring.v3.CreateMetricDescriptorRequest",
"metricDescriptor": {
"type": "custom.googleapis.com/METRIC_NAME",
"valueType": "DOUBLE",
"metricKind": "CUMULATIVE"
}
},
"response": {
"valueType": "DOUBLE",
"name": "projects/PROJECT_ID/metricDescriptors/custom.googleapis.com/METRIC_NAME",
"@type": "type.googleapis.com/google.api.MetricDescriptor",
"type": "custom.googleapis.com/METRIC_NAME",
"metricKind": "CUMULATIVE"
}
},
"insertId": "ID",
"resource": {
"type": "audited_resource",
"labels": {
"service": "monitoring.googleapis.com",
"project_id": "PROJECT_ID",
"method": "google.monitoring.v3.MetricService.CreateMetricDescriptor"
}
},
"timestamp": "2025-05-21T03:54:27.597394162Z",
"severity": "NOTICE",
"logName": "projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Factivity",
"receiveTimestamp": "2025-05-21T03:54:29.185216918Z"
}