Skip to content

Conversation

maribu
Copy link
Member

@maribu maribu commented Oct 23, 2023

Contribution description

While debugging #18977 (comment) it became obvious that the periph_timer in native is broken and issues early IRQs. This replaces the use of setitimer that cannot use a monotonic clock source with timer_settime().

Testing procedure

I have some non-publishable code that tests if the time an ISR fires in terms of timer_read() is no earlier than the time expected. This occasionally triggered with master, but I didn't see any of these issues anymore with this PR. I guess I should revive my PR to spice up the periph timer tests and add a polished version of this and let this run for an hour or two.

The tests ins tests/periph/timer* should still succeed on native. (They do for me in a container running riot/riotbuild).

Issues/PRs references

Found while debugging #18977 (comment)

@maribu maribu requested a review from benpicco October 23, 2023 11:19
@github-actions github-actions bot added Platform: native Platform: This PR/issue effects the native platform Area: cpu Area: CPU/MCU ports labels Oct 23, 2023
@Wer-Wolf
Copy link
Contributor

Could this be because we use ITIMER_REAL as clock source, which is affected by adjtime(2) and settimeofday(2)?
Maybe this problem could be solved by using timer_create() with CLOCK_MONOTONIC.

@MrKevinWeiss
Copy link
Contributor

Should this be backported? Probably not a bad idea.

@maribu maribu force-pushed the native_timer_bug_work_around branch from fd90d58 to 96eef23 Compare October 26, 2023 09:08
@maribu maribu changed the title cpu/native: work around bug in periph_timer cpu/native: fix bug in periph_timer Oct 26, 2023
@maribu maribu force-pushed the native_timer_bug_work_around branch 2 times, most recently from eabef72 to fd1fd2c Compare October 26, 2023 09:11
@maribu
Copy link
Member Author

maribu commented Oct 26, 2023

Could this be because we use ITIMER_REAL as clock source, which is affected by adjtime(2) and settimeofday(2)? Maybe this problem could be solved by using timer_create() with CLOCK_MONOTONIC.

Yes, indeed! This was actually also my first intuition. But rather than reading the code I did a quick grep for CLOCK_[A-Z]*, which only showed matches for CLOCK_MONOTONIC and none CLOCK_REALTIME, so I assumed this wasn't the issue. Sometime actually reading the code containing the bug does pay off 😅

@benpicco
Copy link
Contributor

Hm I still get

>>> timeout happened 9739 µs early <<<
>>> timeout happened 9623 µs early <<<
>>> timeout happened 2156 µs early <<<
>>> timeout happened 3265 µs early <<<
>>> timeout happened 9689 µs early <<<
>>> timeout happened 9795 µs early <<<
>>> timeout happened 2262 µs early <<<
>>> timeout happened 9846 µs early <<<
>>> timeout happened 9796 µs early <<<
>>> timeout happened 9647 µs early <<<
>>> timeout happened 4028 µs early <<<
>>> timeout happened 9670 µs early <<<
>>> timeout happened 4650 µs early <<<
>>> timeout happened 9687 µs early <<<
>>> timeout happened 9769 µs early <<<

with this (and the patches from #18977) applied

@Wer-Wolf
Copy link
Contributor

Maybe the fact that timer_start()/timer_stop() do not stop the value returned by timer_read() from incrementing could also be the reason for this.

@maribu
Copy link
Member Author

maribu commented Oct 26, 2023

@benpicco: That's expected. Most of the time that bug triggered it was not due to the periph_timer bug. But it should be a tad less frequent now, with one of the causes identified and fixed.

@benpicco benpicco added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Oct 27, 2023
@benpicco
Copy link
Contributor

benpicco commented Oct 27, 2023

Urgh so are you suggesting there is (at least) a third bug (besides this one and #18977) at play here that makes timeouts unreliable 😩

Please squash btw

Copy link
Contributor

@benpicco benpicco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trusting our timer unittests

@maribu
Copy link
Member Author

maribu commented Oct 27, 2023

Urgh so are you suggesting there is (at least) a third bug

Yes :) But on the other hand 2/3 of the issue is hopefully solved. (Well, the third bug is the one with the most fallout, though).

@maribu maribu force-pushed the native_timer_bug_work_around branch from 5b2d9b2 to f9f0c27 Compare October 27, 2023 12:35
@riot-ci
Copy link

riot-ci commented Oct 27, 2023

Murdock results

✔️ PASSED

50b841e cpu/native: drop unused real_setitimer

Success Failures Total Runtime
7953 0 7953 17m:12s

Artifacts

@MrKevinWeiss MrKevinWeiss added the Process: needs backport Integration Process: The PR is required to be backported to a release or feature branch label Nov 1, 2023
@MrKevinWeiss
Copy link
Contributor

Do we think this can get in the next little while or should we make it a known issue for this release?

@maribu
Copy link
Member Author

maribu commented Nov 2, 2023

Murdock found quite the bug. I really should give more priority my rewrite of the periph_timer test app higher on my todo, given that the current one was unable to detect the completely broken timer_stop() implementation 😱

Also use `CLOCK_MONOTONIC` for the timeouts, not just for
`timer_read()`. This fixes mismatches between when a timeout
occurs and what is expected in the context of the values returned by
`timer_read()`.
@maribu maribu force-pushed the native_timer_bug_work_around branch from f9f0c27 to 50b841e Compare November 2, 2023 13:13
@maribu maribu added the CI: ready for merge train 🚃 PR is ready to be merged and awaiting the next merge train label Nov 2, 2023
@MrKevinWeiss
Copy link
Contributor

bors merge

Copy link
Contributor

bors bot commented Nov 3, 2023

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@bors bors bot merged commit 4250c15 into RIOT-OS:master Nov 3, 2023
@maribu maribu deleted the native_timer_bug_work_around branch December 5, 2023 08:21
@chrysn
Copy link
Member

chrysn commented Aug 17, 2024

A brief note from a workshop, this broke building on Debian 11.

maribu added a commit to maribu/RIOT that referenced this pull request Aug 17, 2024
maribu added a commit to maribu/RIOT that referenced this pull request Aug 18, 2024
ant9000 pushed a commit to ant9000/RIOT that referenced this pull request Aug 23, 2024
dprigoshij pushed a commit to dprigoshij/RIOT that referenced this pull request Mar 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: cpu Area: CPU/MCU ports CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR CI: ready for merge train 🚃 PR is ready to be merged and awaiting the next merge train Platform: native Platform: This PR/issue effects the native platform Process: needs backport Integration Process: The PR is required to be backported to a release or feature branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants