-
Notifications
You must be signed in to change notification settings - Fork 2.1k
stm32l152re: fix hardfault when DBGMCU_CR_DBG* bits are set and branch after __WFI() #11830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- The __NOP() that was added in RIOT-OS#8518 is now remooved. - When DBG_STANDBY, DBG_STOP or DBG_SLEEP are set in DBG_CR a hardfault occurs on wakeup from sleep. This was first diagnosed in RIOT-OS#8518. When enabled, a hardfault occured when returning from a branch to irq_restore() we avoid the call by inlining the function call. See RIOT-OS#11830 for more details.
- By default openocd sets DBGMCU_APB1_FZ |= DBG_IWDG_STOP | DBG_WWDG_STOP when examine-end is called. This means that `make reset`, `make debug`, `make flash` set these bits. When enabled HCLK and FCLK are not sopped in sleep mode, this changes the base behaviour of the MCU and increases power consumptio.
NOTE: 510ffce could be applied for all STM32 but I want to keep this PR focused on STM32L1 since its the only platform where this triggered hardfaults. I could be extended to other STM32 in subsequent PR's. |
@cladmi I don't know why the github api doesn't let me tag you as a reviewer but I would appreciate you taking a look, specially at the openocd stuff :) |
- The __NOP() that was added in RIOT-OS#8518 is now remooved. - When DBG_STANDBY, DBG_STOP or DBG_SLEEP are set in DBG_CR a hardfault occurs on wakeup from sleep. This was first diagnosed in RIOT-OS#8518. When enabled, a hardfault occured when returning from a branch to irq_restore() we avoid the call by inlining the function call. See RIOT-OS#11830 for more details.
- The __NOP() that was added in RIOT-OS#8518 is now remooved. - When DBG_STANDBY, DBG_STOP or DBG_SLEEP are set in DBG_CR a hardfault occurs on wakeup from sleep. This was first diagnosed in RIOT-OS#8518. When enabled, a hardfault occured when returning from a branch to irq_restore() we avoid the call by inlining the function call. See RIOT-OS#11830 for more details.
I'm on the fence about keeping 510ffce in this PR. Would it make more sense to open a PR that implements this for all stm32* boards? I'm trying to PR them upstream but I'm not sure how long it would take, I could be a good option to get this into RIOT and remove it if it gets upstream. @cladmi since you are deeply involved in everything regarding GDB, what do you think of this approach? Would you suggest another one? |
BTW in the wiki it says you have this board in Berlin, any chance you can give it a tests @kaspar030 @cladmi? |
I would love to look at it, especially completely understand the |
Just as a procedural note, PR titles should say "fix problem foo", not "fix issue #1234". References to issues can be added to the PR description and / or commit messages. |
I'm going to split this PR into two, I got some comment on my PR upstream regarding the openocd changes. |
- The __NOP() that was added in RIOT-OS#8518 is now remooved. - When DBG_STANDBY, DBG_STOP or DBG_SLEEP are set in DBG_CR a hardfault occurs on wakeup from sleep. This was first diagnosed in RIOT-OS#8518. When enabled, a hardfault occured when returning from a branch to irq_restore() we avoid the call by inlining the function call. See RIOT-OS#11830 for more details.
Did the other PR resolve this already? |
Part of it, I was trying to resolve the rest upstream, but it staled. |
Contribution description
When #7385 was introduced when going to sleep (i.e. calling
__WFI()
),irq_enable
was changed toirq_restore
and that brokestm32l152re
.#8518 fixed the issue by introducing a
__NOP()
after__WFI()
. This fixed the issue until #11159 where the call tocortexm_sleep
was changed and now__NOP()
didn't fix the issue but instead somehow triggered it.After debugging in #11820 it was discovered that the changes in #7385 didn't actually break the code, but broke the code only when
DBGMCU_CR_DBG_STANDBY | DBGMCU_CR_DBG_STOP | DBGMCU_CR_DBG_SLEEP
where enabled. By default openocd sets these bits after anexamine-end
event. This is done by default for all stm32 boards.https://github.com/ntfreak/openocd/blob/263deb3802a515ba8155b6c59146f0f539de4e43/tcl/target/stm32l1.cfg#L93-L100
510ffce fixes this behavior by only setting those bit when
gdb
is launched. No need to enable debugging in other cases. This fixed #11820 for normal usage. When debugging this issue was still present.Multiple changes/fixes where attempted in #11820 to fix the issue. Non of them sufficiently consistent or un-intrusive.
As for all I can tell, and as stated in #8518, the issue was the loss of the content of
r0
when jumping toirq_restore
which resulted in a corruptedpc
trying to execute instructions out of the flash region.Since the problem was the branch, with 5d96127
irq_restore
was replaced by__set_PRIMASK(state);
which inlines the function call avoiding the jump and the whole issue all together.The disassembly:
Original disassembly:
Testing procedure
In master this
failshardfaults:make -C tests/xtimer_drift BOARD=nucleo-l152re -j3 flash term
Start a debug session and
print *0xE0042004
make -C tests/xtimer_drift BOARD=nucleo-l152re debug
remove
-event gdb-attach
and-event gdb-dettach
fromboards/nucleo-l152re/dist/openocd.cfg
and see that now the address content is 0.python dist/tools/compile_and_test_for_board/compile_and_test_for_board.py --jobs 4 . nucleo-l152re
, I got the following tests fails (expected fails)Issues/PRs references
Fixes #11820
See also #8518 & #8024