Skip to content

Conversation

GerHobbelt
Copy link

Describe the changes

fix: give task_done_cv its own mutex: it's for sending signals the other way (workers -> wait_for_tasks(), i.e. from workers to management threads, instead of task_available_cv + task_mutex, which are the pair communicating from management down to the workers collective ;-) ). As these two directions should be independent, so must be their mutexes. This prevents all kinds of possible weird lockups and miscommunication.


From: SHA-1: 12253b6

  • fixes:
  • push_task(): document which (member) variables are under which mutex' overwatch and makee sure the code matches this.
    • Case in point: [submitted in other PR]
    • Second case in point: task_done_cv has an opposing purpose and MUST be wrapped by its own mutex to prevent deadlock between wait_for_tasks() and any worker threads. Introducing task_done_mutex for that.

Testing

This was tested as part of a larger work (other PRs are forthcoming shortly) after hunting down shutdown issues (application lockups, etc.) in a large application.

Tested this code via your provided test code rig; see my own fork and the referenced commits which point into there.

Tested on AMD Ryzen 3700X, 128GB RAM, latest Win10/64, latest MSVC2019 dev environment. Using in-house project files which use a (in-house) standardized set of optimizations.

Additional information

TBD

The patches are hopefully largely self-explanatory. Where deemed useful, the original commit messages from the dev fork have been referenced and included.

…other way (workers -> wait_for_tasks(), i.e. from workers to management threads, instead of `task_available_cv` + `task_mutex`, which are the pair communicating from anagement down to the workers collective ;-) ). As these two directions should be independent, so must be their mutexes. This prevents all kinds of possible weird lockups and miscommunication.

---

From: SHA-1: 12253b6

* fixes:
- push_task(): document which (member) variables are under which mutex' overwatch and makee sure the code matches this.
  + Case in point: [submitted in other PR]
  + Second case in point: `task_done_cv` has an opposing purpose and MUST be wrapped by its own mutex to prevent deadlock between wait_for_tasks() and any worker threads. Introducing `task_done_mutex` for that.
@bshoshany
Copy link
Owner

Thanks for opening this pull request! However, I'm not sure I understand why a second mutex is needed. Can you provide a minimal working example demonstrating a race condition?

@bshoshany bshoshany closed this Sep 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants