-
Notifications
You must be signed in to change notification settings - Fork 111
CompiledFunction is not thread safe #226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
davidkoski
commented
Apr 14, 2025
- fixes Crashes occur when running multiple LLMEvaluators. #225
- although CompiledFunction was marked as unchecked(Sendable) it really wasn't
- guard the state with a lock
- fixes #225 - although CompiledFunction was marked as unchecked(Sendable) it really wasn't - guard the state with a lock
@@ -9,6 +9,8 @@ final class CompiledFunction: @unchecked (Sendable) { | |||
/// unique (for the lifetime of the object) identifier for the compiled function | |||
private var id: UInt! | |||
|
|||
let lock = NSLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully this is enough -- I am scoping the lock to the instance of CompileFunction (e.g. to handle cases like compiledSilu
). if the mlx::core side of it isn't thread safe then I can make the lock global instead.
lock.withLock { | ||
innerCall(arguments) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here, we grab the lock every time we call the compiled function? Is that because innerCall
is not thread safe or something in mlx::core was causing issues?
Indeed I think mlx::core::compile
is not entirely thread safe since it puts the compiled function in a cache. I think it probably makes sense to guard reading and writing to the compile cache with a lock. Maybe that will make this lock unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may be two-fold, though this is speculation from observing the behavior:
- the cache itself may not be thread safe, in which case this lock per compiled function won't be enough
- the function / tracer arguments are not thread safe (innerCall) as the cached function itself is not thread safe
I think I was observing the latter as it looked like there were two simultaneous call using the same innerCall & messing with the tracing parameters. That is specifically what this is guarding against and in the failure case we had it seems to work.
But I think the cache itself being not thread safe could very well be an issue, maybe just a little harder to race on as it is a shorter critical section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.. hopefully we can remove this soon!