Skip to content

Conversation

davidkoski
Copy link
Collaborator

- fixes #225
- although CompiledFunction was marked as unchecked(Sendable) it really wasn't
- guard the state with a lock
@davidkoski davidkoski requested review from awni and barronalex April 14, 2025 23:39
@@ -9,6 +9,8 @@ final class CompiledFunction: @unchecked (Sendable) {
/// unique (for the lifetime of the object) identifier for the compiled function
private var id: UInt!

let lock = NSLock()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully this is enough -- I am scoping the lock to the instance of CompileFunction (e.g. to handle cases like compiledSilu). if the mlx::core side of it isn't thread safe then I can make the lock global instead.

Comment on lines +40 to +42
lock.withLock {
innerCall(arguments)
}
Copy link
Member

@awni awni Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here, we grab the lock every time we call the compiled function? Is that because innerCall is not thread safe or something in mlx::core was causing issues?

Indeed I think mlx::core::compile is not entirely thread safe since it puts the compiled function in a cache. I think it probably makes sense to guard reading and writing to the compile cache with a lock. Maybe that will make this lock unnecessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be two-fold, though this is speculation from observing the behavior:

  • the cache itself may not be thread safe, in which case this lock per compiled function won't be enough
  • the function / tracer arguments are not thread safe (innerCall) as the cached function itself is not thread safe

I think I was observing the latter as it looked like there were two simultaneous call using the same innerCall & messing with the tracing parameters. That is specifically what this is guarding against and in the failure case we had it seems to work.

But I think the cache itself being not thread safe could very well be an issue, maybe just a little harder to race on as it is a shorter critical section.

Copy link
Member

@awni awni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.. hopefully we can remove this soon!

@davidkoski davidkoski merged commit 1f1f922 into main Apr 28, 2025
1 check passed
@davidkoski davidkoski deleted the compiled-concurrency branch April 28, 2025 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crashes occur when running multiple LLMEvaluators.
2 participants