Skip to content

Conversation

dnr
Copy link
Contributor

@dnr dnr commented May 7, 2025

What changed?

Set MaxWorkers for dynamic config subscription callback to 1e9 so there's no effective limit.

Why?

A batch of subscription callbacks are enqueued (though not called) while holding a lock. If a callback somehow makes a call back into dynamic config subscriptions, then it would have to wait for the batch to finish enqueueing. If there's more than MaxWorkers of those, that would cause a deadlock. We don't need to impose a limit on workers here.

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

@dnr dnr requested a review from a team as a code owner May 7, 2025 18:42
@@ -18,7 +18,7 @@ var (
"dynamicconfig.subscriptionCallback",
subscriptionCallbackSettings{
MinWorkers: 10,
MaxWorkers: 100,
MaxWorkers: 1e9, // effectively unlimited
Copy link
Contributor

@prathyushpv prathyushpv May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then why don't we remove this config MaxWorkers altogether?

If we don't specify a maximum number of threads, is there value in using AdaptivePool? Can't we just create a new goroutine each time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly: because the pool currently requires some max and I wanted to make a minimal change right now.

There's still some value in AdaptivePool: because of the timeout, in cases where callbacks are fast and don't call back into dynamic config (which is the common case), it won't create that many goroutines. Only when they call back into dynamic config and get stuck, then it'll create more goroutines than it otherwise would.

But yeah, you could argue the whole thing is overkill and we could just create new goroutines. It was an attempt to reduce the impact if we need to call something like 1M callbacks at once.

@dnr dnr merged commit fe82194 into temporalio:main May 8, 2025
53 checks passed
@dnr dnr deleted the dc23 branch May 8, 2025 17:00
ychebotarev pushed a commit that referenced this pull request May 8, 2025
## What changed?
Set MaxWorkers for dynamic config subscription callback to 1e9 so
there's no effective limit.

## Why?
A batch of subscription callbacks are enqueued (though not called) while
holding a lock. If a callback somehow makes a call back into dynamic
config subscriptions, then it would have to wait for the batch to finish
enqueueing. If there's more than MaxWorkers of those, that would cause a
deadlock. We don't need to impose a limit on workers here.

## How did you test it?
- [ ] built
- [ ] run locally and tested manually
- [x] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)
josesa added a commit to josesa/temporal that referenced this pull request May 12, 2025
* main: (22 commits)
  Add host health metrics gauge (temporalio#7728)
  add rule expiration check (temporalio#7749)
  Add activity options to the pending activity info (temporalio#7727)
  Enable DLQ V2 for replication (temporalio#7746)
  chore: be smarter about when to use Stringer vs String (temporalio#7743)
  versioning entity workflows: enabling auto-restart pt1 (temporalio#7715)
  Refactor code generators (temporalio#7734)
  allow passive to generate replication tasks (temporalio#7713)
  Validate links in completion callbacks (temporalio#7726)
  CHASM: Engine Update/ReadComponent implementation (temporalio#7696)
  Enable transition history in dev env and tests (temporalio#7737)
  chore: Add Stringer tags (temporalio#7738)
  Add internal pod health check to DeepHealthCheck (temporalio#7709)
  Rename internal CHASM task processing interface (temporalio#7730)
  [Frontend] Log slow gRPC requests (temporalio#7718)
  Remove cap for dynamic config callback pool (temporalio#7723)
  Refactor updateworkflowoptions package (temporalio#7725)
  Remove a bunch of redundant utf-8 validation (temporalio#7720)
  [CHASM] Pure task processing - GetPureTasks, ExecutePureTasks (temporalio#7701)
  Send ActivityReset flag to the worker in heartbeat response (temporalio#7677)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants