Skip to content

core/sched_switch: fix crash with no active thread #20878

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

maribu
Copy link
Member

@maribu maribu commented Sep 29, 2024

Contribution description

The function sched_switch() was implemented with the assumption that there will always be an active thread. This was true until threadless idle was implemented for Cortex M MCUs: If no thread is runnable and the running thread exists, there will be no active thread. If from ISR a thread is then unblocked, sched_switch() will be called without an active thread.

This handles the corner case explicitly when the module core_idle_thread is not used (in other words: for threadless idle).

Implementation choice

I first wanted to avoid adding the check for this very unlikely corner case to this hot path, and rather have in cpu_switch_context_exit() of the only offender to make sure to set the active thread to something reasonable. However, switching to a non-runnable thread without running it is something that I have no elegant, simple and maintainable solution for. So I decided to rather add a simple and maintainable check into a hot path.

Testing procedure

See #20812

Issues/PRs references

Fixes #20812

The function `sched_switch()` was implemented with the assumption that
there will always be an active thread. This was true until threadless
idle was implemented for Cortex M MCUs: If no thread is runnable and
the running thread exists, there will be no active thread. If from
ISR a thread is then unblocked, `sched_switch()` will be called without
an active thread.

This handles the corner case explicitly when the module
`core_idle_thread` is not used (in other words: for threadless idle).

Fixes RIOT-OS#20812
@maribu maribu added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: core Area: RIOT kernel. Handle PRs marked with this with care! CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Sep 29, 2024
@maribu maribu requested a review from kaspar030 as a code owner September 29, 2024 16:04
@riot-ci
Copy link

riot-ci commented Sep 29, 2024

Murdock results

✔️ PASSED

03081de fixup! core/sched_switch: fix crash with no active thread

Success Failures Total Runtime
10197 0 10197 16m:05s

Artifacts

/* If a thread exists and no other thread is runnable, we may end up in
* a situation with no active thread. This can not occur if there is an
* idle thread, which is always runnable, though */
if (!IS_USED(MODULE_CORE_IDLE_THREAD) && unlikely(active_thread == NULL)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is correct, bit I'm not sure this is holding it right. thread_yield_higher() can always be used and is actually the preferred choice. sched_switch() was an optimization path for cases like unlocking a mutex or waking up another thread, from some thread, to only invoke the full scheduler if needed (as the target thread / priority is basically known).

If this is actually called from ISRs, that is the only way there is no active_thread. It might make more sense to encode this properly. As in, if there is no active thread, the irq_is_in() below becomes redundant, that is implicit. And in that case, sched_context_switch_request = 1 might be the correct thing to do.
Or, if irq_is_in(), always trigger scheduler, otherwise, assume not in ISR and that there is an active_thread.

(In theory, this function can be used to speed up task switching by quite a bit for some cases, b/c we could directly switch to the receiving thread, skipping register storing for the calling thread. we never implemented that, though.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sched_switch() was an optimization path for cases like unlocking a mutex or waking up another thread, from some thread, to only invoke the full scheduler if needed (as the target thread / priority is basically known).

This makes it sound like an ugly hack / premature optimization. In normal workloads contention over a mutex is pretty unlikely and the hot path is the case a mutex_unlock() will not find any waiters. And for real time behavior: Adding a sched_switch() prior to the call to thread_yield_higher() increases the worst case latency at the hope to occasionally omit the call to thread_yield_higher().

I wonder if just getting rid of sched_switch() is the better approach here. See #20890 for removing it for mutex_unlock(). There are a few other calls to sched_switch() that I think might happen from IRQ context to also address to actually fix the issue.

@benpicco
Copy link
Contributor

benpicco commented Oct 8, 2024

I suppose we can close this now?

@maribu
Copy link
Member Author

maribu commented Oct 8, 2024

If I recall correctly, the same issue could be triggered when calling sched_switch() by setting thread flags from IRQ context.

So, not yet. Maybe calling thread_yield_higher() directly is also the better approach there, but I haven't really looked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: core Area: RIOT kernel. Handle PRs marked with this with care! CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(cortex-m) unexpected kernel panic after thread exit
4 participants