-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Remove cap for dynamic config callback pool #7723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -18,7 +18,7 @@ var ( | |||
"dynamicconfig.subscriptionCallback", | |||
subscriptionCallbackSettings{ | |||
MinWorkers: 10, | |||
MaxWorkers: 100, | |||
MaxWorkers: 1e9, // effectively unlimited |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then why don't we remove this config MaxWorkers
altogether?
If we don't specify a maximum number of threads, is there value in using AdaptivePool
? Can't we just create a new goroutine each time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly: because the pool currently requires some max and I wanted to make a minimal change right now.
There's still some value in AdaptivePool: because of the timeout, in cases where callbacks are fast and don't call back into dynamic config (which is the common case), it won't create that many goroutines. Only when they call back into dynamic config and get stuck, then it'll create more goroutines than it otherwise would.
But yeah, you could argue the whole thing is overkill and we could just create new goroutines. It was an attempt to reduce the impact if we need to call something like 1M callbacks at once.
## What changed? Set MaxWorkers for dynamic config subscription callback to 1e9 so there's no effective limit. ## Why? A batch of subscription callbacks are enqueued (though not called) while holding a lock. If a callback somehow makes a call back into dynamic config subscriptions, then it would have to wait for the batch to finish enqueueing. If there's more than MaxWorkers of those, that would cause a deadlock. We don't need to impose a limit on workers here. ## How did you test it? - [ ] built - [ ] run locally and tested manually - [x] covered by existing tests - [ ] added new unit test(s) - [ ] added new functional test(s)
* main: (22 commits) Add host health metrics gauge (temporalio#7728) add rule expiration check (temporalio#7749) Add activity options to the pending activity info (temporalio#7727) Enable DLQ V2 for replication (temporalio#7746) chore: be smarter about when to use Stringer vs String (temporalio#7743) versioning entity workflows: enabling auto-restart pt1 (temporalio#7715) Refactor code generators (temporalio#7734) allow passive to generate replication tasks (temporalio#7713) Validate links in completion callbacks (temporalio#7726) CHASM: Engine Update/ReadComponent implementation (temporalio#7696) Enable transition history in dev env and tests (temporalio#7737) chore: Add Stringer tags (temporalio#7738) Add internal pod health check to DeepHealthCheck (temporalio#7709) Rename internal CHASM task processing interface (temporalio#7730) [Frontend] Log slow gRPC requests (temporalio#7718) Remove cap for dynamic config callback pool (temporalio#7723) Refactor updateworkflowoptions package (temporalio#7725) Remove a bunch of redundant utf-8 validation (temporalio#7720) [CHASM] Pure task processing - GetPureTasks, ExecutePureTasks (temporalio#7701) Send ActivityReset flag to the worker in heartbeat response (temporalio#7677) ...
What changed?
Set MaxWorkers for dynamic config subscription callback to 1e9 so there's no effective limit.
Why?
A batch of subscription callbacks are enqueued (though not called) while holding a lock. If a callback somehow makes a call back into dynamic config subscriptions, then it would have to wait for the batch to finish enqueueing. If there's more than MaxWorkers of those, that would cause a deadlock. We don't need to impose a limit on workers here.
How did you test it?