-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
Description
This issue wants to document a recurrent issue that has been seen on stm32l152re
platforms.
The history so far:
When #7385 was introduced when going to sleep (i.e. calling __WFI()
), irq_enable
was changed to irq_restore
and that broke stm32l152re
.
#8518 fixed the issue by introducing a __NOP()
after __WFI()
. This fixed the issue until #11159 where the call to cortexm_sleep
was changed and now __NOP()
didn't fix the issue but instead somehow triggered it.
#11159 re-introduced the issue by changing the way pm_set_lowest()
was called since pm_set()
was now implemented for STM32L1
. A single __NOP()
did not fix the issue anymore.
In #11820 it was discovered that the changes in #7385 didn't actually break the code, but broke the code only when DBGMCU_CR_DBG_STANDBY | DBGMCU_CR_DBG_STOP | DBGMCU_CR_DBG_SLEEP
where enabled. By default openocd sets these bits after an examine-end
event. This is done by default for all stm32 boards.
In #11919 Since the problem was the branch, with 5d96127 irq_restore
was replaced by __set_PRIMASK(state);
which inlines the function call avoiding the jump and the whole issue all together.
#13999 inline the implementation of irq_restore
so the fix in #11919 will be removed.
The faulty scenario
So from debugging output at some point after wake-up the pc
gets corrupted and un-reachable instructions gets executed. This only happens when DBGMCU_CR_DBG_STANDBY | DBGMCU_CR_DBG_STOP | DBGMCU_CR_DBG_SLEEP
are set. An example for the debugging output is here:
2019-07-09 17:38:58,314 - INFO # Attempting to reconstruct state for debugging...
2019-07-09 17:38:58,315 - INFO # In GDB:
2019-07-09 17:38:58,316 - INFO # set $pc=0x7822d5e
2019-07-09 17:38:58,317 - INFO # frame 0
2019-07-09 17:38:58,317 - INFO # bt
2019-07-09 17:38:58,318 - INFO #
2019-07-09 17:38:58,319 - INFO # ISR stack overflowed by at least 16 bytes.
Hints to cause
Looking around in stm32
and cortex-m3
erratas and datasheet. I found a mention of a similar issue with stm32f4
in this ERRATA section 2.1.3. In this errata there are some hints as issue that happen the WFE/WFI are placed at 4 byte alignment and problems with the pref-etch buffer. Although this is a differtent cpu (cortex-m4)
, it made me snoop around the pref-etch buffer and made me think a similar issue might be happening on cortex-m3
In the case of cortex-m3
the pref-etch buffer can fetch two 32bits instructions or 4 16bits instructions but only in sequential code execution. In our code the PRFTEN
and ACC64
are enabled so we are reading 64 bits at a time. It stm32l1xxx reference manual it is stated:
When the code is not sequential (branch), the instruction may not be present neither in the
current instruction line used nor in the prefetched instruction line. In this case, the penalty in
terms of number of cycles is at least equal to the number of Wait States.
This lead me to believe that for some reason when the branch instruction is present it is executing a corrupted pre-fetch buffer instruction, or in other terms an instruction that isn't present. For some reason this only happens when the HCLK and FCLK stays enabled in sleep mode. This might have something to do with different wake-up times since the clock is always enabled for the core?? I wasn't able to find many details of what happens on wake-up, and what could be different when HCLK stays enabled.
Steps to reproduce the issue
This issue does not show up currently in master unless a single __NOP
is added after https://github.com/fjmolinas/RIOT/blob/e7a1b40cde17dc5f407c9b3884a2603ab656ac7e/cpu/cortexm_common/include/cpu.h#L172. This may change depending on current master, since for a while a single __NOP()
fixed the issue.
Expected results
No crash ever..
Actual results
Has crashed in the past.
Possible FIXES
If the issue shows up again 3 __NOP
could fix the issue.