stm32l152re: fix hardfault when DBGMCU_CR_DBG* bits are set and branch after __WFI() #11830

fjmolinas · 2019-07-12T10:12:24Z

Contribution description

When #7385 was introduced when going to sleep (i.e. calling __WFI()), irq_enable was changed to irq_restore and that broke stm32l152re.

#8518 fixed the issue by introducing a __NOP() after __WFI(). This fixed the issue until #11159 where the call to cortexm_sleep was changed and now __NOP() didn't fix the issue but instead somehow triggered it.

After debugging in #11820 it was discovered that the changes in #7385 didn't actually break the code, but broke the code only when DBGMCU_CR_DBG_STANDBY | DBGMCU_CR_DBG_STOP | DBGMCU_CR_DBG_SLEEP where enabled. By default openocd sets these bits after an examine-end event. This is done by default for all stm32 boards.

https://github.com/ntfreak/openocd/blob/263deb3802a515ba8155b6c59146f0f539de4e43/tcl/target/stm32l1.cfg#L93-L100

510ffce fixes this behavior by only setting those bit when gdb is launched. No need to enable debugging in other cases. This fixed #11820 for normal usage. When debugging this issue was still present.

Multiple changes/fixes where attempted in #11820 to fix the issue. Non of them sufficiently consistent or un-intrusive.

As for all I can tell, and as stated in #8518, the issue was the loss of the content of r0 when jumping to irq_restore which resulted in a corrupted pc trying to execute instructions out of the flash region.

2019-07-09 17:38:58,314 - INFO # Attempting to reconstruct state for debugging...
2019-07-09 17:38:58,315 - INFO # In GDB:
2019-07-09 17:38:58,316 - INFO #   set $pc=0x7822d5e
2019-07-09 17:38:58,317 - INFO #   frame 0
2019-07-09 17:38:58,317 - INFO #   bt
2019-07-09 17:38:58,318 - INFO # 
2019-07-09 17:38:58,319 - INFO # ISR stack overflowed by at least 16 bytes.

Since the problem was the branch, with 5d96127 irq_restore was replaced by __set_PRIMASK(state); which inlines the function call avoiding the jump and the whole issue all together.

The disassembly:

 80006e6:       f023 0304       bic.w   r3, r3, #4
 80006ea:       6113            str     r3, [r2, #16]
 80006ec:       f7ff ffac       bl      8000648 <irq_disable>
 80006f0:       f3bf 8f4f       dsb     sy
 80006f4:       bf30            wfi
 80006f6:       f380 8810       msr     PRIMASK, r0
 80006fa:       b33c            cbz     r4, 800074c <pm_set+0x74>
 80006fc:       e8bd 4010       ldmia.w sp!, {r4, lr}
 8000700:       f000 bc82       b.w     8001008 <stmclk_init_sysclk>
 8000704:       4b13            ldr     r3, [pc, #76]   ; (8000754 <pm_set+0x7c>)

Original disassembly:

 80006e4:       6913            ldr     r3, [r2, #16]
 80006e6:       f023 0304       bic.w   r3, r3, #4
 80006ea:       6113            str     r3, [r2, #16]
 80006ec:       f7ff ffac       bl      8000648 <irq_disable>
 80006f0:       f3bf 8f4f       dsb     sy
 80006f4:       bf30            wfi
 80006f6:       f7ff ffab       bl      8000650 <irq_enable>
 80006fa:       b33c            cbz     r4, 800074c <pm_set+0x74>
 80006fc:       e8bd 4010       ldmia.w sp!, {r4, lr}
 8000700:       f000 bc82       b.w     8001008 <stmclk_init_sysclk>
 8000704:       4b13            ldr     r3, [pc, #76]   ; (8000754 <pm_set+0x7c>)

Testing procedure

Check out both commits separately and see that they both fix the issue.

In master this ~~fails~~hardfaults:

make -C tests/xtimer_drift BOARD=nucleo-l152re -j3 flash term

Verify settings in debug:

Start a debug session and print *0xE0042004

make -C tests/xtimer_drift BOARD=nucleo-l152re debug

pm_set (mode=mode@entry=2) at /home/francisco/workspace/RIOT/cpu/stm32_common/periph/pm.c:94
94	    switch (mode) {
(gdb) print *0xE0042004
$1 = 7
(gdb)

remove -event gdb-attach and -event gdb-dettach from boards/nucleo-l152re/dist/openocd.cfg and see that now the address content is 0.

to be safe run python dist/tools/compile_and_test_for_board/compile_and_test_for_board.py --jobs 4 . nucleo-l152re, I got the following tests fails (expected fails)

ERROR:nucleo-l152re:Tests failed: 7
Failures during test:
- [tests/driver_grove_ledbar](tests/driver_grove_ledbar/test.failed)
- [tests/driver_my9221](tests/driver_my9221/test.failed)
- [tests/gnrc_ipv6_ext](tests/gnrc_ipv6_ext/test.failed)
- [tests/gnrc_rpl_srh](tests/gnrc_rpl_srh/test.failed)
- [tests/gnrc_sock_dns](tests/gnrc_sock_dns/test.failed)
- [tests/pkg_fatfs_vfs](tests/pkg_fatfs_vfs/test.failed)
- [tests/pkg_qdsa](tests/pkg_qdsa/test.failed)

Issues/PRs references

Fixes #11820
See also #8518 & #8024

- The __NOP() that was added in RIOT-OS#8518 is now remooved. - When DBG_STANDBY, DBG_STOP or DBG_SLEEP are set in DBG_CR a hardfault occurs on wakeup from sleep. This was first diagnosed in RIOT-OS#8518. When enabled, a hardfault occured when returning from a branch to irq_restore() we avoid the call by inlining the function call. See RIOT-OS#11830 for more details.

- By default openocd sets DBGMCU_APB1_FZ |= DBG_IWDG_STOP | DBG_WWDG_STOP when examine-end is called. This means that `make reset`, `make debug`, `make flash` set these bits. When enabled HCLK and FCLK are not sopped in sleep mode, this changes the base behaviour of the MCU and increases power consumptio.

fjmolinas · 2019-07-12T11:12:32Z

BTW 48a7930 works when applied to #8518

fjmolinas · 2019-07-12T11:14:43Z

NOTE: 510ffce could be applied for all STM32 but I want to keep this PR focused on STM32L1 since its the only platform where this triggered hardfaults. I could be extended to other STM32 in subsequent PR's.

fjmolinas · 2019-07-12T11:27:45Z

@cladmi I don't know why the github api doesn't let me tag you as a reviewer but I would appreciate you taking a look, specially at the openocd stuff :)

- The __NOP() that was added in RIOT-OS#8518 is now remooved. - When DBG_STANDBY, DBG_STOP or DBG_SLEEP are set in DBG_CR a hardfault occurs on wakeup from sleep. This was first diagnosed in RIOT-OS#8518. When enabled, a hardfault occured when returning from a branch to irq_restore() we avoid the call by inlining the function call. See RIOT-OS#11830 for more details.

fjmolinas · 2019-07-17T07:19:26Z

I'm on the fence about keeping 510ffce in this PR. Would it make more sense to open a PR that implements this for all stm32* boards? I'm trying to PR them upstream but I'm not sure how long it would take, I could be a good option to get this into RIOT and remove it if it gets upstream. @cladmi since you are deeply involved in everything regarding GDB, what do you think of this approach? Would you suggest another one?

fjmolinas · 2019-07-17T07:20:49Z

BTW in the wiki it says you have this board in Berlin, any chance you can give it a tests @kaspar030 @cladmi?

cladmi · 2019-07-17T09:32:17Z

I would love to look at it, especially completely understand the openocd changes, but would be more after the release.

kaspar030 · 2019-07-17T09:51:35Z

Just as a procedural note, PR titles should say "fix problem foo", not "fix issue #1234". References to issues can be added to the PR description and / or commit messages.

fjmolinas · 2019-07-25T16:31:51Z

I'm going to split this PR into two, I got some comment on my PR upstream regarding the openocd changes.

- The __NOP() that was added in RIOT-OS#8518 is now remooved. - When DBG_STANDBY, DBG_STOP or DBG_SLEEP are set in DBG_CR a hardfault occurs on wakeup from sleep. This was first diagnosed in RIOT-OS#8518. When enabled, a hardfault occured when returning from a branch to irq_restore() we avoid the call by inlining the function call. See RIOT-OS#11830 for more details.

benpicco · 2020-02-06T11:18:03Z

Did the other PR resolve this already?

fjmolinas · 2020-02-06T11:41:53Z

Did the other PR resolve this already?

Part of it, I was trying to resolve the rest upstream, but it staled.

fjmolinas added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Platform: ARM Platform: This PR/issue effects ARM-based platforms Area: pm Area: (Low) power management labels Jul 12, 2019

fjmolinas requested review from cladmi, kaspar030 and kYc0o July 12, 2019 10:12

fjmolinas added 2 commits July 12, 2019 12:13

fjmolinas force-pushed the pr_fix_11820 branch from b339518 to 243f9e4 Compare July 12, 2019 10:14

fjmolinas mentioned this pull request Jul 12, 2019

stm32l1/vendor: update vendor files to v2.3.0 #11489

Merged

fjmolinas requested review from aabadie and removed request for cladmi July 12, 2019 11:17

fjmolinas mentioned this pull request Jul 12, 2019

cpu/stm32l1: optimize power consumption #11832

Merged

fjmolinas changed the title ~~stm32l152re: fix issue 11820~~ stm32l152re: fix hardfault when DBGMCU_CR_DBG* bits are set and branch after __WFI() Jul 17, 2019

fjmolinas mentioned this pull request Jul 25, 2019

cpu/cortexm_common: replace irq_restore by __set_PRIMASK for stm32l152re #11919

Merged

fjmolinas mentioned this pull request Apr 29, 2020

cpu/stm32f1: CPU hangs after wake-up from STOP power mode #13918

Closed

MrKevinWeiss added this to the Release 2021.07 milestone Jun 21, 2021

MrKevinWeiss removed this from the Release 2021.07 milestone Jul 15, 2021

fjmolinas closed this Jun 9, 2022

fjmolinas deleted the pr_fix_11820 branch June 9, 2022 08:48

yn386 mentioned this pull request Sep 17, 2022

ports/stm32/boards: Add NUCLEO-L152RE board support. micropython/micropython#9144

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

stm32l152re: fix hardfault when DBGMCU_CR_DBG* bits are set and branch after __WFI() #11830

stm32l152re: fix hardfault when DBGMCU_CR_DBG* bits are set and branch after __WFI() #11830

Uh oh!

fjmolinas commented Jul 12, 2019 •

edited

Loading

Uh oh!

fjmolinas commented Jul 12, 2019

Uh oh!

fjmolinas commented Jul 12, 2019

Uh oh!

fjmolinas commented Jul 12, 2019

Uh oh!

fjmolinas commented Jul 17, 2019

Uh oh!

fjmolinas commented Jul 17, 2019

Uh oh!

cladmi commented Jul 17, 2019

Uh oh!

kaspar030 commented Jul 17, 2019

Uh oh!

fjmolinas commented Jul 25, 2019

Uh oh!

benpicco commented Feb 6, 2020

Uh oh!

fjmolinas commented Feb 6, 2020

Uh oh!

Uh oh!

stm32l152re: fix hardfault when DBGMCU_CR_DBG* bits are set and branch after __WFI() #11830

stm32l152re: fix hardfault when DBGMCU_CR_DBG* bits are set and branch after __WFI() #11830

Uh oh!

Conversation

fjmolinas commented Jul 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contribution description

Testing procedure

Issues/PRs references

Uh oh!

fjmolinas commented Jul 12, 2019

Uh oh!

fjmolinas commented Jul 12, 2019

Uh oh!

fjmolinas commented Jul 12, 2019

Uh oh!

fjmolinas commented Jul 17, 2019

Uh oh!

fjmolinas commented Jul 17, 2019

Uh oh!

cladmi commented Jul 17, 2019

Uh oh!

kaspar030 commented Jul 17, 2019

Uh oh!

fjmolinas commented Jul 25, 2019

Uh oh!

benpicco commented Feb 6, 2020

Uh oh!

fjmolinas commented Feb 6, 2020

Uh oh!

Uh oh!

fjmolinas commented Jul 12, 2019 •

edited

Loading