Skip to content

add scaler for temporal #6191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Feb 6, 2025
Merged

add scaler for temporal #6191

merged 34 commits into from
Feb 6, 2025

Conversation

Prajithp
Copy link
Contributor

@Prajithp Prajithp commented Sep 26, 2024

Implement a temporal scaler

Checklist

  • When introducing a new scaler, I agree with the scaling governance policy
  • Tests have been added
  • Commits are signed with Developer Certificate of Origin (DCO - learn more)
  • I have verified that my change is according to the deprecations & breaking changes policy
  • [N/A] A PR is opened to update our Helm chart (repo) (if applicable, ie. when deployment manifests are modified)
  • Changelog has been updated and is aligned with our changelog requirements
  • A PR is opened to update the documentation on (repo) (if applicable)

Docs: kedacore/keda-docs#1207

Fixes #4724

Signed-off-by: Prajithp <prajithpalakkuda@gmail.com>
@Prajithp Prajithp requested a review from a team as a code owner September 26, 2024 08:17
@febinct febinct mentioned this pull request Sep 26, 2024
7 tasks
@cretz
Copy link

cretz commented Sep 26, 2024

See comment at temporalio/temporal#33 (comment). Temporal's KEDA approach may slightly differ. Will have the engineers review, but we may suggest slight differences.

Signed-off-by: Prajithp <prajithpalakkuda@gmail.com>
Signed-off-by: Prajithp <prajithpalakkuda@gmail.com>
@febinct
Copy link

febinct commented Sep 30, 2024

any suggestion/comments @cretz

@robholland
Copy link
Contributor

We're currently discussing which use cases we would like the scaler to support, we'll be in a position to give some feedback/direction on Friday 4th.

Signed-off-by: Prajithp <prajithpalakkuda@gmail.com>
Signed-off-by: Prajithp <prajithpalakkuda@gmail.com>
@jhecking
Copy link

jhecking commented Oct 4, 2024

We rolled out this new scaler to one of our dev clusters. (Rebased on top of the v2.15 release branch.) Activation/deactivation is working as expected. BUT, what I'm seeing is that the kena-operator pod is eating up all its allocated CPU when the temporal trigger is active. When I pause the scaledObject with the temporal trigger, then the CPU utilisation goes back to near zero. There are several other scaledObjects with prometheus triggers which don't cause this problem.

I don't see anything relevant in the keda-operator logs, even on DEBUG log level. I enabled profiling and this is the flame graph I see when this is happening:

Screenshot 2024-10-04 at 5 56 47 PM

[go tool pprof -http=:8081 "http://localhost:8082/debug/pprof/profile?seconds=60"]

For reference, here is a "normal" flame graph when all the Temporal triggers are paused:

Screenshot 2024-10-04 at 5 57 31 PM

One detail that might be relevant is that keda is connecting to the Temporal server via our Consul service mesh, i.e. there is a consul proxy injected into the keda-operator pod and the Temporal scaler is configured to connect to localhost:7233. But Keda is able to connect to the Temporal server, i.e. there are no connection errors. And we use this same configuration for all the Temporal worker services in the same cluster that Keda is supposed to scale, and none of them show this same behaviour.

I'm a bit at a loss as to how to debug this further. Any suggestions?

DeleteKubernetesResources(t, testNamespace, data, templates)
}

func testActivation(t *testing.T, kc *kubernetes.Clientset, data templateData) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will work on some test cases here, but I suspect that the Temporal scaler is not going to be able to provide accurate activation (specifically 1->0). It does not have sight of activity levels that are low enough not to cause a backlog. We may need to couple the Temporal scaler with the prometheus scaler for the activation aspect for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to add any other test cases? @robholland

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we'll need a build ID test, where a new deployment comes up with a new build ID and the two deployments are shown to scale separately (older one down as tasks on old version complete, newer one up as they start on that queue). I'm also not convinced that activation based on backlog is safe when scaling to 0, it won't account for tasks that are in-flight or low enough throughput to not cause a backlog. Scaling to 0 and then having to scale back up to clear the queue when things get killed and re-scheduled isn't ideal. Not sure yet the best way to test that, or whether we should just remove the activation altogether and recommend users couple the Temporal scaler with a Prometheus scaler, where Prometheus handles the activation and Temporal the scaling. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding activation based on backlog, I believe we should keep this option rather than removing it, as it can be partially controlled by customising scale-down behaviour in the HPA. This feature would be useful for workloads with very short execution times. @robholland

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an option in omes to specify a build ID for the test cases? @robholland

Copy link
Contributor

@robholland robholland Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using the new worker versioning system you don't update a deployment to a new build ID, rather you create a new deployment for the new build ID. The two deployments will both be present until the workflows on the older build ID have completed, then you can tear the old one down. So we'd have a scaled object for each deployment/build id.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhecking this is the use case we are currently exploring, I'll be investigating prometheus metrics we might be able to use to tide us over until we can provide some equivalent of tasks_dispatch_rate that covers in-flight (if that's going to be possible).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robholland added a test case for BuildID. Can you please verify?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robholland

I'll be investigating prometheus metrics we might be able to use to tide us over until we can provide some equivalent of tasks_dispatch_rate that covers in-flight (if that's going to be possible).

I know you've been out for a week, but was wondering whether you've had any time to look into this? Maybe I'm missing something, but wouldn't tasks_dispatch_rate also only be useful for 1->0 scaling, but not for 0->1 scaling? I.e. if all the workers have been shut down, then no new tasks can be dispatched, so the task dispatch rate cannot go back above zero.

I still think a variant of the approximate_backlog_count metric that includes both pending and in-flight tasks would be best suited for 1->0 and 0->1 scaling.

I'm new to KEDA myself. Is it possible to use different metrics for 1->0/0->1 and 1->n/n->1 scaling decisions? Maybe by using two separate triggers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes task dispatch rate would only be useful for de-activation (but not reliable, due to it's caveats as discussed).. approximate_backlog_count should be useable for activation. You can mix different triggers together, I believe you'd use https://keda.sh/docs/2.14/reference/scaledobject-spec/#scalingmodifiers for that.

Signed-off-by: Prajithp <prajithpalakkuda@gmail.com>
Signed-off-by: Prajithp <prajithpalakkuda@gmail.com>
@jhecking
Copy link

jhecking commented Oct 8, 2024

The issue with the high CPU utilization only happens when Keda is connecting to the Temporal server via the Consul sidecar proxy. In another cluster, without Consul, the Temporal scaler is running fine and Keda is using minimal CPU.

When the Consul sidecar is injected, the proxy itself shows minimal CPU usage; only the keda-operator container itself uses up all its allocated CPU resources. Neither the Consul logs nor the Keda logs show any connection errors. Also, Keda and the Temporal scaler are able to connect to the Temporal server and are activating/deactivating the targeted pod based on those metrics.

But one other thing I could observe is that at times the Temporal scaler's GetMetricsAndActivity is being called much more frequently than every 30 seconds, e.g.:

2024-10-08T08:54:27Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:54:57Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:55:27Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:55:57Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:56:27Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:56:57Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:57:27Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:57:57Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:57:57Z    INFO    scaleexecutor    Successfully updated ScaleTarget    {"scaledobject.Name": "connector-mysql-2d9c6761", "scaledObject.Namespace": "borneo", "scaleTarget.Name": "connector-mysql-2d9c6761", "Original Replicas Count": 0, "New Replicas Count": 1}
2024-10-08T08:57:58Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:58:13Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:58:27Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:58:28Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:58:43Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:58:57Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:58:58Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:59:13Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:59:27Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}

Before the scaleexecutor event above Keda is already running at 100% CPU, but the Temporal scaler only gets queried every 30 seconds, as expected. After that event, the scaler is being queried much more frequently than once every 30 seconds. (I added the "Getting Temporal queue size" log in the scaler's getQueueSize function, right before the s.tcl.DescribeTaskQueueEnhanced call.)

@jhecking
Copy link

jhecking commented Oct 9, 2024

I sniffed the network traffic between the Keda operator and the Consul proxy today, and that finally offered a clue as to the cause of the high CPU utilisation, and – more importantly – a solution. So in order to route the queries from the Temporal scaler to the Temporal server via the Consul proxy, I had configured the endpoint for the scaler to localhost:7233.

Somehow, and I don't quite understand this yet, this is causing Keda to try to connect to the proxy on port 7233 using both 127.0.0.1 as well as using the IPv6 address ::1. The IPv4 connections are successful, but the IPv6 connections are getting rejected by the proxy. What's worse, Keda sends these IPv6 connection attempts at a rate of more than 1M connection attempts in 60 seconds, i.e. ~16,666 connection attempts per second. That's what's causing the high CPU load. It doesn't seem to give up on this nor back off.

Changing the endpoint config for the scaler from localhost:7233 to 127.0.0.1:7233 has resolved the issue completely. Keda has been running stable, and working as expected, with half a dozen scaledObjects with Temporal triggers since this simple config change.

APIKey string `keda:"name=apiKey, order=authParams;resolvedEnv;triggerMetadata, optional"`

UnsafeSsl bool `keda:"name=unsafeSsl, order=triggerMetadata, optional"`
Cert string `keda:"name=cert, order=authParams;resolvedEnv, optional"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more standard in Temporal deployments to have env vars that point to files for the key/cert rather than have them present in the environment directly. It would be good to have CertPath/KeyPath/CaPath or similar.

Copy link
Contributor Author

@Prajithp Prajithp Oct 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robholland That's currently not possible with KEDA. KEDA can only read from environment variables or secrets. Ideally, in k8s, certificates and keys are stored in secrets, so this can be easily managed using TriggerAuthentication. @JorTurFer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file paths are normally pointed to certs that are mounted into the pods from secrets, so presumably users can just point at the secrets directly for KEDA use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it can be done with TriggerAuthentication.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we must read them from Kubernetes API and not from file. This is because we can't restart KEDA to include new certs. Take into account that KEDA provides a self-service approach where admins can set up KEDA and users can deploy their own resources, so assuming that certs are part of the KEDA's filesystem isn't an option. If the SDK only support reading them from files, the scaler must pull the secret and store in a temporal file within the container

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our mTLS client certificates and keys auto rotate, live on the filesystem, at least every hour. Our worker code handles the rotation without restarting.

I think any code using mTLS to connect to Temporal needs to be able to handle certificates and keys that are rotated frequently. Getting the certificates and keys from the filesystem seems natural and normal, if there is a way for KEDA to support that it would be good.

We use a sidecar to write/rotate the certificates and keys. But there are other Kubernetes systems for mTLS certificate management that also use filesystems and not Kubernetes Secret resources, like cert-manager csi-driver-spiffe.

Signed-off-by: Prajithp <prajithpalakkuda@gmail.com>
Signed-off-by: Prajithp <prajithpalakkuda@gmail.com>
Signed-off-by: Prajithp <prajithpalakkuda@gmail.com>
@JorTurFer
Copy link
Member

JorTurFer commented Oct 16, 2024

/run-e2e temporal
Update: You can check the progress here

Prajithp and others added 3 commits October 17, 2024 08:24
Co-authored-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es>
Signed-off-by: Prajith <prajithpalakkuda@gmail.com>
Co-authored-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es>
Signed-off-by: Prajith <prajithpalakkuda@gmail.com>
Co-authored-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es>
Signed-off-by: Prajith <prajithpalakkuda@gmail.com>
@Prajithp
Copy link
Contributor Author

/run-e2e temporal Update: You can check the progress here

@JorTurFer build is failing. Should I include the vendor directory?

Signed-off-by: Prajithp <prajithpalakkuda@gmail.com>
@black-snow
Copy link

Shameful bump.

Sadly, I can't see what Snyk's complaining about.

@atihkin
Copy link

atihkin commented Dec 2, 2024

Checking in here again - hoping we can squeeze this in before the New Year 👀

@zroubalik
Copy link
Member

Checking in here again - hoping we can squeeze this in before the New Year 👀

Hi, FYI, the next release is scheduled to end Jan 2025, maybe a bit later. Let's make sure we are able to merge this in

Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some conflict, please fix them

@zroubalik
Copy link
Member

zroubalik commented Dec 4, 2024

/run-e2e temporal
Update: You can check the progress here

Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure that all tasks from the list in the PR description are done, before requesting a review. Thanks!

Signed-off-by: Prajithp <prajithpalakkuda@gmail.com>
Signed-off-by: Prajithp <prajithpalakkuda@gmail.com>
@Prajithp
Copy link
Contributor Author

Prajithp commented Dec 5, 2024

@zroubalik Please check.

@zroubalik
Copy link
Member

zroubalik commented Jan 2, 2025

/run-e2e temporal
Update: You can check the progress here

Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of comments on the docs PR, ptal:

kedacore/keda-docs#1207 (review)

Signed-off-by: Prajithp <prajithpalakkuda@gmail.com>
Signed-off-by: Prajithp <prajithpalakkuda@gmail.com>
Signed-off-by: Prajithp <prajithpalakkuda@gmail.com>
@mattbrandman
Copy link

Does this still look like it's going to get in the January release? Have a very applicable use case for this and very excited to scale on queue depth.

@atihkin
Copy link

atihkin commented Jan 24, 2025

@JorTurFer was there anything else holding this one up or are we mostly good to go?

@JorTurFer
Copy link
Member

JorTurFer commented Jan 24, 2025

@JorTurFer was there anything else holding this one up or are we mostly good to go?

It's almost done, just another (maintainer) approval is left :)

@mattbrandman
Copy link

@zroubalik anything left before you could approve this as well? Currently hacked in this functionality with the prom keda scaler but unsure if there are advantages to native besides ease of use.

@zroubalik
Copy link
Member

@zroubalik anything left before you could approve this as well? Currently hacked in this functionality with the prom keda scaler but unsure if there are advantages to native besides ease of use.

@mattbrandman Sorry for the delay, will try to tackle it this week.

@zroubalik
Copy link
Member

zroubalik commented Feb 6, 2025

/run-e2e temporal
Update: You can check the progress here

@mattbrandman
Copy link

@zroubalik anything left before you could approve this as well? Currently hacked in this functionality with the prom keda scaler but unsure if there are advantages to native besides ease of use.

@mattbrandman Sorry for the delay, will try to tackle it this week.

Absolutely no apologies required, you’re all doing incredible work and it’s so useful. Legitimately one of my favorite projects.

Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great job!

@zroubalik zroubalik merged commit c036742 into kedacore:main Feb 6, 2025
20 checks passed
@tomwheeler tomwheeler mentioned this pull request Feb 11, 2025
40 tasks
@JCMais
Copy link

JCMais commented Apr 9, 2025

Yes. Can you please review?

We might eventually need the ability to use other task queue stats besides ApproximateBacklogCount for scaling purposes. Ref. #6191 (comment) and prior discussion on the same thread. But I'd prefer to get this PR merged now and add support for additional metrics later as and when needed.

Other than that, I don't have any further input on this PR. We are successfully using an earlier version of this branch in several production clusters to scale Temporal-based services down to zero. For 1->n scaling we mostly still rely on Prometheus metrics published by the Temporal workers. We are combining the two metrics using scaling modifiers, as was suggested by Rob. (Thanks, @robholland, for that suggestion, btw!)

@jhecking would you be able to share the scaling config you are using where you are mixing both Temporal and Prometheus scalers?

@Yadavanurag13
Copy link

@Prajithp I have been assigned issue #6690. Earlier, there used to be a versionInfo object that included typeinfo and was used to keep count. However, since versionInfo has been deprecated in favor of versioningInfo, how can I now calculate the length of the queue? Can you help with this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Temporal.io Scaler