-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Fix a race condition in scoped_timer::finalize #7618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a race condition in scoped_timer::finalize #7618
Conversation
scoped_timer::finalize is called from fork. However, it may race with other threads creating or freeing timer threads. This patch removes the loop in scoped_timer::finalize (because it is not needed and it may spin) and also removes two unlocked assignments. The idle thread is added to "available_workers" in scoped_timer::~scoped_timer destructor. If we call the "finalize" method as a part of total memory cleanup, all the scoped_timers' destructors were already executed and all the worker threads are already on "available_workers" vector. So, we don't need to loop; the first loop iteration will clean all the threads. If the "finalize" method is called from single-threaded program's fork(), then all the scoped timers' destructors are already called and the case is analogous to the previous case. If the "finalize" method is called from multi-threaded program's fork(), then it breaks down - the "num_workers" variable is the total amount of workers (both sleeping and busy), and we loop until we terminated "num_workers" threads - that means that if the number of sleeping workers is less than "num_workers", the function just spins. Then, there is unlocked assignment to "num_workers = 0" and "available_workers.clear()" that can race with other threads doing z3 work and corrupt memory. available_workers.clear() is not needed, because it was already cleared by std::swap(available_workers, cleanup_workers) (and that was correctly locked). Signed-off-by: Mikulas Patocka <mikulas@twibright.com>
You didn't reply to my comment in the other PR. So this patch doesn't seem safe. |
If you have multiple threads running and you fork, it's impossible to do some z3 calculations in the child process. Some compute thread in the parent may be holding a mutex and if you fork, the thread is not replicated in the child, thus no one drops the mutex and the child deadlocks. It wasn't supported even without this patch, this patch doesn't make it worse. In my use-case, I'm not trying to run z3 from the child process. I run multithreaded z3 in the parent process and when I fork, I just setup file handles and call exec in the child process. I want to make sure that the fork()+exec() won't damage z3 calculations running in the parent process - that's what I'm trying to fix with this patch. |
But the othread_atfork waits for the threads to finish before forking. Isnt' that sufficient to avoid deadlocks? |
It waits for timer threads to finish. Not for compute threads to finish. And the mutexes may be held by the compute threads too. |
This passed my performance & stress tests. Merging next. |
scoped_timer::finalize is called from fork. However, it may race with other threads creating or freeing timer threads. This patch removes the loop in scoped_timer::finalize (because it is not needed and it may spin) and also removes two unlocked assignments. The idle thread is added to "available_workers" in scoped_timer::~scoped_timer destructor. If we call the "finalize" method as a part of total memory cleanup, all the scoped_timers' destructors were already executed and all the worker threads are already on "available_workers" vector. So, we don't need to loop; the first loop iteration will clean all the threads. If the "finalize" method is called from single-threaded program's fork(), then all the scoped timers' destructors are already called and the case is analogous to the previous case. If the "finalize" method is called from multi-threaded program's fork(), then it breaks down - the "num_workers" variable is the total amount of workers (both sleeping and busy), and we loop until we terminated "num_workers" threads - that means that if the number of sleeping workers is less than "num_workers", the function just spins. Then, there is unlocked assignment to "num_workers = 0" and "available_workers.clear()" that can race with other threads doing z3 work and corrupt memory. available_workers.clear() is not needed, because it was already cleared by std::swap(available_workers, cleanup_workers) (and that was correctly locked). Signed-off-by: Mikulas Patocka <mikulas@twibright.com>
scoped_timer::finalize is called from fork. However, it may race with other threads creating or freeing timer threads.
This patch removes the loop in scoped_timer::finalize (because it is not needed and it may spin) and also removes two unlocked assignments.
The idle thread is added to "available_workers" in scoped_timer::~scoped_timer destructor.
If we call the "finalize" method as a part of total memory cleanup, all the scoped_timers' destructors were already executed and all the worker threads are already on "available_workers" vector. So, we don't need to loop; the first loop iteration will clean all the threads.
If the "finalize" method is called from single-threaded program's fork(), then all the scoped timers' destructors are already called and the case is analogous to the previous case.
If the "finalize" method is called from multi-threaded program's fork(), then it breaks down - the "num_workers" variable is the total amount of workers (both sleeping and busy), and we loop until we terminated "num_workers" threads - that means that if the number of sleeping workers is less than "num_workers", the function just spins.
Then, there is unlocked assignment to "num_workers = 0" and "available_workers.clear()" that can race with other threads doing z3 work and corrupt memory. available_workers.clear() is not needed, because it was already cleared by std::swap(available_workers, cleanup_workers) (and that was correctly locked).