Skip to content

Keda operator fails with "unable to get unprocessedEventCount for metrics: unable to get checkpoint from storage: %!w(<nil>)" in v.2.15.1 using Azure event Hub trigger #6084

@chamindac

Description

@chamindac

I have keda deployed with version v2.15.1 on AKS using work load identity. AKS k8s version is 1.29.7.
My scaled job trigges based on azure event hub. Keda operator shows issue "unable to get unprocessedEventCount for metrics: unable to get checkpoint from storage: %!w()"

The setup was working fine with KEDA v2.14.2 on AKS using work load identity. AKS k8s version is 1.29.7.

Scled job shows below issues

Status:
  Conditions:
    Message:  Some triggers defined in ScaledJob are not working correctly
    Reason:   PartialTriggerError
    Status:   Unknown
    Type:     Ready
    Message:  Scaling is not performed because triggers are not active
    Reason:   ScalerNotActive
    Status:   False
    Type:     Active
    Status:   Unknown
    Type:     Fallback
    Status:   Unknown
    Type:     Paused
Events:
  Type     Reason              Age                     From           Message
  ----     ------              ----                    ----           -------
  Normal   KEDAScalersStarted  7m16s (x4 over 7m16s)   scale-handler  Scaler azure-eventhub is built.
  Normal   KEDAScalersStarted  7m16s                   scale-handler  Started scalers watch
  Normal   ScaledJobReady      7m16s                   keda-operator  ScaledJob is ready for scaling
  Warning  KEDAScalerFailed    7m16s (x2 over 7m16s)   scale-handler  unable to get runtimeInfo for metrics: context canceled
  Warning  KEDAScalerFailed    2m16s (x61 over 7m14s)  scale-handler  unable to get unprocessedEventCount for metrics: unable to get checkpoint from storage: %!w(<nil>)

The keda operator pod log shows below

2024-08-16T12:57:17Z    INFO    scaleexecutor   Scaling Jobs    {"scaledJob.Name": "mydemo-scaledjob", "scaledJob.Namespace": "avalanche", "Number of running Jobs": 0}
2024-08-16T12:57:17Z    INFO    scaleexecutor   Scaling Jobs    {"scaledJob.Name": "mydemo-scaledjob", "scaledJob.Namespace": "avalanche", "Number of pending Jobs": 0}
2024-08-16T12:57:22Z    ERROR   scale_handler   Error getting scaler metrics and activity, but continue {"scaledJob.Name": "mydemo-scaledjob", "Scaler": "*scalers.azureEventHubScaler:", "error": "unable to get unprocessedEventCount for metrics: unable to get checkpoint from storage: %!w(<nil>)"}
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScaledJobMetrics
        /workspace/pkg/scaling/scale_handler.go:853
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).isScaledJobActive
        /workspace/pkg/scaling/scale_handler.go:897
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
        /workspace/pkg/scaling/scale_handler.go:262
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
        /workspace/pkg/scaling/scale_handler.go:182

If I deploy KEDA v2.14.2 or v2.14.3 on top of v2.15.1 without changing anything else in my setup everything starts to work fine. and status of my scaled job comes back to normal as below log shows.

Status:
  Conditions:
    Message:  ScaledJob is defined correctly and is ready to scaling
    Reason:   ScaledJobReady
    Status:   True
    Type:     Ready
    Message:  Scaling is not performed because triggers are not active
    Reason:   ScalerNotActive
    Status:   False
    Type:     Active
    Status:   Unknown
    Type:     Fallback
    Status:   Unknown
    Type:     Paused
Events:
  Type     Reason              Age                 From           Message
  ----     ------              ----                ----           -------
  Normal   KEDAScalersStarted  20m (x4 over 20m)   scale-handler  Scaler azure-eventhub is built.
  Normal   KEDAScalersStarted  20m                 scale-handler  Started scalers watch
  Normal   ScaledJobReady      20m                 keda-operator  ScaledJob is ready for scaling
  Warning  KEDAScalerFailed    20m (x2 over 20m)   scale-handler  unable to get runtimeInfo for metrics: context canceled
  Warning  KEDAScalerFailed    19m (x18 over 20m)  scale-handler  unable to get unprocessedEventCount for metrics: unable to get checkpoint from storage: %!w(<nil>)
  Normal   KEDAScalersStarted  16m (x2 over 16m)   scale-handler  Scaler azure-eventhub is built.
  Normal   KEDAScalersStarted  16m                 scale-handler  Started scalers watch
  Normal   ScaledJobReady      16m                 keda-operator  ScaledJob is ready for scaling
  Normal   KEDAJobsCreated     16m                 scale-handler  Created 1 jobs
  Normal   KEDAScalersStarted  14m (x2 over 14m)   scale-handler  Scaler azure-eventhub is built.
  Normal   KEDAScalersStarted  14m                 scale-handler  Started scalers watch
  Normal   KEDAJobsCreated     12m (x22 over 14m)  scale-handler  Created 0 jobs

Below are more information on my setup.

I deployed keda using below

helm repo add kedacore https://kedacore.github.io/charts
            helm repo update

            helm upgrade keda kedacore/keda --install `
              --namespace keda `
              --version 2.15.1 `
              --set serviceAccount.operator.create=true `
              --set serviceAccount.operator.name=keda-operator `
              --set podIdentity.azureWorkload.enabled=true `
              --set podIdentity.azureWorkload.clientId=$(sys_aks_uai_client_id) `
              --set podIdentity.azureWorkload.tenantId=$(tenantid)

KEDA triiger auth setup as

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: av-keda-trigger-auth
  namespace: mynamespace
spec:
  podIdentity:
    provider: azure-workload

My scaled job triggers

triggers:
    - type: azure-eventhub
      metadata:
        consumerGroup: largevideogenerator
        unprocessedEventThreshold: "1"
        activationUnprocessedEventThreshold: "0"
        blobContainer: largevideogenerator-largevideogenerationrequired
        eventHubNamespace: myeventhubnamespace
        eventHubName: largevideogenerationrequired
        storageAccountName: mystoragename
        checkpointStrategy: blobMetadata
      authenticationRef:
        name: av-keda-trigger-auth
    - type: azure-eventhub
      metadata:
        consumerGroup: largevideogenerator
        unprocessedEventThreshold: "1"
        activationUnprocessedEventThreshold: "0"
        blobContainer: largevideogenerator-regeneratelargevideo
        eventHubNamespace: myeventhubnamespace
        eventHubName: regeneratelargevideo
        storageAccountName: mystoragename
        checkpointStrategy: blobMetadata
      authenticationRef:
        name: av-keda-trigger-auth

I can provide more information and logs if required.

In summary this is what happens

  • In fresh setup of KEDA v2.15.1 with everything else identical - does not work and give "unable to get unprocessedEventCount for metrics: unable to get checkpoint from storage: %!w()" in keda operator pod.
  • In failing setup of KEDA v2.15.1 if i deploy KEDA v2.14.2 or v2.14.3 triggers run and scaledjobs getting created as expected.
  • In fresh setup of KEDA v2.14.2 with everything else identical - works fine.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Ready To Ship

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions