Skip to content

Conversation

derMihai
Copy link
Contributor

@derMihai derMihai commented Mar 4, 2025

Contribution description

Note: This PR builds on top of #21254

From core/include/wait_queue.h:

This API is a thin wrapper around thread flags group and requires
"core_thread_flags_group" to be enabled in USEMODULE.

Wait queues enable lock-free, IRQ-safe condition signaling. This API is
inspired from the Linux Kernel.

Wait queues have similar semantics to condition variables, but don't require
setting the condition + signaling to be atomic, hence no mutex is needed to
function properly. In turn, one may safely call queue_wake() from an ISR.
Note, while cond_signal() and cond_broadcast() are safe to call from an ISR
context too, doing so will probably cause a race condition elsewhere.
Consider the following scenario using condition variables:

static uint64_t measurement;
mutex_t cond_lock = MUTEX_INIT;
cond_t cond = COND_INIT;

void measurement_irq(void)
{
    measurement = measure();
    cond_broadcast(&cond);
}

void wait_for_critical_value(void)
{
    mutex_lock(&cond_lock);
    while (atomic_load_u64(&measurement) < THRESHOLD) {
        cond_wait(&cond, cond_lock);
    }
    mutex_unlock(&cond_lock);
}

Note, the mutex is there only for the cond_wait() API call, as we're not
allowed to call mutex_lock() inside the ISR. This alone is a hint that
something isn't right and indeed, the following sequence of events is
possible:

  1. thread sees measurement < THRESHOLD, and is about to call cond_wait()
  2. ISR fires, sets measurement = THRESHOLD and signals the condition
  3. thread calls cond_wait() and goes to sleep, possibly forever

Using a wait queue, we can do this:

static uint64_t measurement;
wq_t wq = WAIT_QUEUE_INIT;

void measurement_irq(void)
{
    measurement = measure();
    queue_wake_all(&wq);
}

void wait_for_critical_value(void)
{
   wait_queue_join(&wq);
   while (atomic_load_u64(&measurement) < THRESHOLD) {
       queue_wait(&wq);
   }
   wait_queue_leave(&wq);
}

This is free of the race condition above because if the ISR fires between
the condition check and the queue_wait() call, queue_wait() will return
because the THREAD_FLAG_WAIT_QUEUE thread flag used in the
implementation must also have been set. In other words, if the condition is
true, then the thread flag is also set.

If you have a simple condition check expression, the waiter boiler plate
code can be eliminated by using the QUEUE_WAIT() macro, which combines
the join, wait, and leave operations:

void wait_for_critical_value(void)
{
    QUEUE_WAIT(&wq, atomic_load_u64(&measurement) >= THRESHOLD);
}

If you don't want to wait indefinitely, you can do it with a timeout:

void wait_for_critical_value(void)
{
    if (QUEUE_WAIT_ZTIMER(ZTIMER_MSEC, 10, &wq, atomic_load_u64(&measurement) >= THRESHOLD)) {
        puts("condition met!");
    }
    else {
        puts("timeout!");
    }
}
Limitations

Be aware that the condition checking is fenced but not atomic w.r. to
signaling, so you have to ensure that by other means. E.g. in the example
code above this is enforced by atomic_load_u64().

When to use?

If you know for sure you're synchronizing between threads only (no ISR),
then the condition variable has the advantage of implicit condition
setting/checking atomicity through the mutex. Otherwise go for the wait
queue, as it's more flexible by allowing the waker to be in ISR context.

Testing procedure

Run the test application on:

  • native
  • samd20-xpro

Issues/PRs references

@github-actions github-actions bot added Area: tests Area: tests and testing framework Area: core Area: RIOT kernel. Handle PRs marked with this with care! labels Mar 4, 2025
@derMihai derMihai force-pushed the mir/tflags_wq branch 2 times, most recently from 320e88f to ef2b82a Compare March 4, 2025 18:33
@derMihai derMihai force-pushed the mir/tflags_wq branch 2 times, most recently from 5c96ba0 to 42f01fa Compare March 20, 2025 09:41
@github-actions github-actions bot added Platform: AVR Platform: This PR/issue effects AVR-based platforms Area: cpu Area: CPU/MCU ports labels Jun 24, 2025
@crasbe crasbe added Type: new feature The issue requests / The PR implemements a new feature for RIOT CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Jun 26, 2025
@riot-ci
Copy link

riot-ci commented Jun 26, 2025

Murdock results

✔️ PASSED

897d0c7 core: add wait queue wrapper around thread flags group

Success Failures Total Runtime
10557 0 10560 19m:43s

Artifacts

@crasbe crasbe added the Process: needs >1 ACK Integration Process: This PR requires more than one ACK label Jul 10, 2025
@github-actions github-actions bot added the Process: missing approvals Integration Process: PR needs more ACKS (handled by action) label Jul 10, 2025
@crasbe
Copy link
Contributor

crasbe commented Jul 10, 2025

Now that #21254 is merged, you can rebase this.

@crasbe crasbe added the State: needs rebase State: The codebase was changed since the creation of the PR, making a rebase necessary label Jul 11, 2025
@github-actions github-actions bot removed Platform: AVR Platform: This PR/issue effects AVR-based platforms Area: cpu Area: CPU/MCU ports labels Jul 11, 2025
@derMihai derMihai marked this pull request as ready for review July 11, 2025 15:31
@crasbe crasbe removed the State: needs rebase State: The codebase was changed since the creation of the PR, making a rebase necessary label Jul 11, 2025
* }
* ```
*
* If you don't want to wait indefinitely, you can do it with a timeout:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should mention here that the ztimer has to be acquired to avoid that it's free'd somewhere else and the wait is indefinite again.
Your code currently does not acquire the ztimer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the docs from ztimer_now():

 *   A clock is also guaranteed to be active from the time any timer is set
 *   (the first opportunity to get a "now" value from it is the return value of
 *   @ref ztimer_set) until the time the timer's callback returns.

ztimer_set_timeout_flag() will call ztimer_set() under the hood, which will acuire the timer, if required. In fact, all ztimer API which registers something is acquiring the timer, as that is trivial to do. The timer only has to be acquired if the application needs it to be running, but the clock is not aware of that (as in the case of ztimer_now()).

@derMihai derMihai force-pushed the mir/tflags_wq branch 2 times, most recently from 850eb1c to 4c07d0e Compare July 18, 2025 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: core Area: RIOT kernel. Handle PRs marked with this with care! Area: tests Area: tests and testing framework CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Process: missing approvals Integration Process: PR needs more ACKS (handled by action) Process: needs >1 ACK Integration Process: This PR requires more than one ACK Type: new feature The issue requests / The PR implemements a new feature for RIOT
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants