core/sched_switch: fix crash with no active thread #20878

maribu · 2024-09-29T16:04:25Z

Contribution description

The function sched_switch() was implemented with the assumption that there will always be an active thread. This was true until threadless idle was implemented for Cortex M MCUs: If no thread is runnable and the running thread exists, there will be no active thread. If from ISR a thread is then unblocked, sched_switch() will be called without an active thread.

This handles the corner case explicitly when the module core_idle_thread is not used (in other words: for threadless idle).

Implementation choice

I first wanted to avoid adding the check for this very unlikely corner case to this hot path, and rather have in cpu_switch_context_exit() of the only offender to make sure to set the active thread to something reasonable. However, switching to a non-runnable thread without running it is something that I have no elegant, simple and maintainable solution for. So I decided to rather add a simple and maintainable check into a hot path.

Testing procedure

See #20812

Issues/PRs references

Fixes #20812

The function `sched_switch()` was implemented with the assumption that there will always be an active thread. This was true until threadless idle was implemented for Cortex M MCUs: If no thread is runnable and the running thread exists, there will be no active thread. If from ISR a thread is then unblocked, `sched_switch()` will be called without an active thread. This handles the corner case explicitly when the module `core_idle_thread` is not used (in other words: for threadless idle). Fixes RIOT-OS#20812

riot-ci · 2024-09-29T16:21:29Z

Murdock results

✔️ PASSED

03081de fixup! core/sched_switch: fix crash with no active thread

Success	Failures	Total	Runtime
10197	0	10197	16m:05s

Artifacts

Documentation preview

kaspar030 · 2024-09-30T09:37:43Z

core/sched.c

+    /* If a thread exists and no other thread is runnable, we may end up in
+     * a situation with no active thread. This can not occur if there is an
+     * idle thread, which is always runnable, though */
+    if (!IS_USED(MODULE_CORE_IDLE_THREAD) && unlikely(active_thread == NULL)) {


This change is correct, bit I'm not sure this is holding it right. thread_yield_higher() can always be used and is actually the preferred choice. sched_switch() was an optimization path for cases like unlocking a mutex or waking up another thread, from some thread, to only invoke the full scheduler if needed (as the target thread / priority is basically known).

If this is actually called from ISRs, that is the only way there is no active_thread. It might make more sense to encode this properly. As in, if there is no active thread, the irq_is_in() below becomes redundant, that is implicit. And in that case, sched_context_switch_request = 1 might be the correct thing to do.
Or, if irq_is_in(), always trigger scheduler, otherwise, assume not in ISR and that there is an active_thread.

(In theory, this function can be used to speed up task switching by quite a bit for some cases, b/c we could directly switch to the receiving thread, skipping register storing for the calling thread. we never implemented that, though.)

sched_switch() was an optimization path for cases like unlocking a mutex or waking up another thread, from some thread, to only invoke the full scheduler if needed (as the target thread / priority is basically known).

This makes it sound like an ugly hack / premature optimization. In normal workloads contention over a mutex is pretty unlikely and the hot path is the case a mutex_unlock() will not find any waiters. And for real time behavior: Adding a sched_switch() prior to the call to thread_yield_higher() increases the worst case latency at the hope to occasionally omit the call to thread_yield_higher().

I wonder if just getting rid of sched_switch() is the better approach here. See #20890 for removing it for mutex_unlock(). There are a few other calls to sched_switch() that I think might happen from IRQ context to also address to actually fix the issue.

benpicco · 2024-10-08T12:34:51Z

I suppose we can close this now?

maribu · 2024-10-08T13:36:56Z

If I recall correctly, the same issue could be triggered when calling sched_switch() by setting thread flags from IRQ context.

So, not yet. Maybe calling thread_yield_higher() directly is also the better approach there, but I haven't really looked.

maribu added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: core Area: RIOT kernel. Handle PRs marked with this with care! CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Sep 29, 2024

maribu requested a review from kaspar030 as a code owner September 29, 2024 16:04

kaspar030 reviewed Sep 30, 2024

View reviewed changes

maribu mentioned this pull request Oct 5, 2024

core/mutex: use thread_yield_higher() in mutex_unlock() #20890

Merged

fixup! core/sched_switch: fix crash with no active thread

03081de

maribu mentioned this pull request Jan 9, 2025

core/sync: add wait queues #21123

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

core/sched_switch: fix crash with no active thread #20878

core/sched_switch: fix crash with no active thread #20878

maribu commented Sep 29, 2024

Uh oh!

riot-ci commented Sep 29, 2024 •

edited

Loading

Uh oh!

kaspar030 Sep 30, 2024

Uh oh!

maribu Oct 5, 2024

Uh oh!

benpicco commented Oct 8, 2024

Uh oh!

maribu commented Oct 8, 2024

Uh oh!

Uh oh!

core/sched_switch: fix crash with no active thread #20878

Are you sure you want to change the base?

core/sched_switch: fix crash with no active thread #20878

Conversation

maribu commented Sep 29, 2024

Contribution description

Implementation choice

Testing procedure

Issues/PRs references

Uh oh!

riot-ci commented Sep 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Murdock results

Artifacts

Uh oh!

kaspar030 Sep 30, 2024

Choose a reason for hiding this comment

Uh oh!

maribu Oct 5, 2024

Choose a reason for hiding this comment

Uh oh!

benpicco commented Oct 8, 2024

Uh oh!

maribu commented Oct 8, 2024

Uh oh!

Uh oh!

riot-ci commented Sep 29, 2024 •

edited

Loading