Skip to content

Conversation

fjmolinas
Copy link
Contributor

@fjmolinas fjmolinas commented Jul 12, 2019

Contribution description

When #7385 was introduced when going to sleep (i.e. calling __WFI()), irq_enable was changed to irq_restore and that broke stm32l152re.

#8518 fixed the issue by introducing a __NOP() after __WFI(). This fixed the issue until #11159 where the call to cortexm_sleep was changed and now __NOP() didn't fix the issue but instead somehow triggered it.

After debugging in #11820 it was discovered that the changes in #7385 didn't actually break the code, but broke the code only when DBGMCU_CR_DBG_STANDBY | DBGMCU_CR_DBG_STOP | DBGMCU_CR_DBG_SLEEP where enabled. By default openocd sets these bits after an examine-end event. This is done by default for all stm32 boards.

https://github.com/ntfreak/openocd/blob/263deb3802a515ba8155b6c59146f0f539de4e43/tcl/target/stm32l1.cfg#L93-L100

510ffce fixes this behavior by only setting those bit when gdb is launched. No need to enable debugging in other cases. This fixed #11820 for normal usage. When debugging this issue was still present.

Multiple changes/fixes where attempted in #11820 to fix the issue. Non of them sufficiently consistent or un-intrusive.

As for all I can tell, and as stated in #8518, the issue was the loss of the content of r0 when jumping to irq_restore which resulted in a corrupted pc trying to execute instructions out of the flash region.

2019-07-09 17:38:58,314 - INFO # Attempting to reconstruct state for debugging...
2019-07-09 17:38:58,315 - INFO # In GDB:
2019-07-09 17:38:58,316 - INFO #   set $pc=0x7822d5e
2019-07-09 17:38:58,317 - INFO #   frame 0
2019-07-09 17:38:58,317 - INFO #   bt
2019-07-09 17:38:58,318 - INFO # 
2019-07-09 17:38:58,319 - INFO # ISR stack overflowed by at least 16 bytes.

Since the problem was the branch, with 5d96127 irq_restore was replaced by __set_PRIMASK(state); which inlines the function call avoiding the jump and the whole issue all together.

The disassembly:

 80006e6:       f023 0304       bic.w   r3, r3, #4
 80006ea:       6113            str     r3, [r2, #16]
 80006ec:       f7ff ffac       bl      8000648 <irq_disable>
 80006f0:       f3bf 8f4f       dsb     sy
 80006f4:       bf30            wfi
 80006f6:       f380 8810       msr     PRIMASK, r0
 80006fa:       b33c            cbz     r4, 800074c <pm_set+0x74>
 80006fc:       e8bd 4010       ldmia.w sp!, {r4, lr}
 8000700:       f000 bc82       b.w     8001008 <stmclk_init_sysclk>
 8000704:       4b13            ldr     r3, [pc, #76]   ; (8000754 <pm_set+0x7c>)

Original disassembly:

 80006e4:       6913            ldr     r3, [r2, #16]
 80006e6:       f023 0304       bic.w   r3, r3, #4
 80006ea:       6113            str     r3, [r2, #16]
 80006ec:       f7ff ffac       bl      8000648 <irq_disable>
 80006f0:       f3bf 8f4f       dsb     sy
 80006f4:       bf30            wfi
 80006f6:       f7ff ffab       bl      8000650 <irq_enable>
 80006fa:       b33c            cbz     r4, 800074c <pm_set+0x74>
 80006fc:       e8bd 4010       ldmia.w sp!, {r4, lr}
 8000700:       f000 bc82       b.w     8001008 <stmclk_init_sysclk>
 8000704:       4b13            ldr     r3, [pc, #76]   ; (8000754 <pm_set+0x7c>)

Testing procedure

  • Check out both commits separately and see that they both fix the issue.

In master this failshardfaults:

make -C tests/xtimer_drift BOARD=nucleo-l152re -j3 flash term

  • Verify settings in debug:

Start a debug session and print *0xE0042004

make -C tests/xtimer_drift BOARD=nucleo-l152re debug

pm_set (mode=mode@entry=2) at /home/francisco/workspace/RIOT/cpu/stm32_common/periph/pm.c:94
94	    switch (mode) {
(gdb) print *0xE0042004
$1 = 7
(gdb) 

remove -event gdb-attach and -event gdb-dettach from boards/nucleo-l152re/dist/openocd.cfg and see that now the address content is 0.

  • to be safe run python dist/tools/compile_and_test_for_board/compile_and_test_for_board.py --jobs 4 . nucleo-l152re, I got the following tests fails (expected fails)
ERROR:nucleo-l152re:Tests failed: 7
Failures during test:
- [tests/driver_grove_ledbar](tests/driver_grove_ledbar/test.failed)
- [tests/driver_my9221](tests/driver_my9221/test.failed)
- [tests/gnrc_ipv6_ext](tests/gnrc_ipv6_ext/test.failed)
- [tests/gnrc_rpl_srh](tests/gnrc_rpl_srh/test.failed)
- [tests/gnrc_sock_dns](tests/gnrc_sock_dns/test.failed)
- [tests/pkg_fatfs_vfs](tests/pkg_fatfs_vfs/test.failed)
- [tests/pkg_qdsa](tests/pkg_qdsa/test.failed)

Issues/PRs references

Fixes #11820
See also #8518 & #8024

@fjmolinas fjmolinas added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Platform: ARM Platform: This PR/issue effects ARM-based platforms Area: pm Area: (Low) power management labels Jul 12, 2019
@fjmolinas fjmolinas requested review from cladmi, kaspar030 and kYc0o July 12, 2019 10:12
- The __NOP() that was added in RIOT-OS#8518 is now remooved.
- When DBG_STANDBY, DBG_STOP or DBG_SLEEP are set in DBG_CR a hardfault
  occurs on wakeup from sleep. This was first diagnosed in RIOT-OS#8518. When
  enabled, a hardfault occured when returning from a branch to irq_restore()
  we avoid the call by inlining the function call. See RIOT-OS#11830 for more
  details.
- By default openocd sets DBGMCU_APB1_FZ |= DBG_IWDG_STOP | DBG_WWDG_STOP
  when examine-end is called. This means that `make reset`, `make debug`,
  `make flash` set these bits. When enabled HCLK and FCLK are not sopped
  in sleep mode, this changes the base behaviour of the MCU and
  increases power consumptio.
@fjmolinas
Copy link
Contributor Author

BTW 48a7930 works when applied to #8518

@fjmolinas
Copy link
Contributor Author

NOTE: 510ffce could be applied for all STM32 but I want to keep this PR focused on STM32L1 since its the only platform where this triggered hardfaults. I could be extended to other STM32 in subsequent PR's.

@fjmolinas fjmolinas requested review from aabadie and removed request for cladmi July 12, 2019 11:17
@fjmolinas
Copy link
Contributor Author

@cladmi I don't know why the github api doesn't let me tag you as a reviewer but I would appreciate you taking a look, specially at the openocd stuff :)

fjmolinas added a commit to fjmolinas/RIOT that referenced this pull request Jul 12, 2019
- The __NOP() that was added in RIOT-OS#8518 is now remooved.
- When DBG_STANDBY, DBG_STOP or DBG_SLEEP are set in DBG_CR a hardfault
  occurs on wakeup from sleep. This was first diagnosed in RIOT-OS#8518. When
  enabled, a hardfault occured when returning from a branch to irq_restore()
  we avoid the call by inlining the function call. See RIOT-OS#11830 for more
  details.
fjmolinas added a commit to fjmolinas/RIOT that referenced this pull request Jul 12, 2019
- The __NOP() that was added in RIOT-OS#8518 is now remooved.
- When DBG_STANDBY, DBG_STOP or DBG_SLEEP are set in DBG_CR a hardfault
  occurs on wakeup from sleep. This was first diagnosed in RIOT-OS#8518. When
  enabled, a hardfault occured when returning from a branch to irq_restore()
  we avoid the call by inlining the function call. See RIOT-OS#11830 for more
  details.
@fjmolinas
Copy link
Contributor Author

I'm on the fence about keeping 510ffce in this PR. Would it make more sense to open a PR that implements this for all stm32* boards? I'm trying to PR them upstream but I'm not sure how long it would take, I could be a good option to get this into RIOT and remove it if it gets upstream. @cladmi since you are deeply involved in everything regarding GDB, what do you think of this approach? Would you suggest another one?

@fjmolinas
Copy link
Contributor Author

BTW in the wiki it says you have this board in Berlin, any chance you can give it a tests @kaspar030 @cladmi?

@cladmi
Copy link
Contributor

cladmi commented Jul 17, 2019

I would love to look at it, especially completely understand the openocd changes, but would be more after the release.

@kaspar030
Copy link
Contributor

Just as a procedural note, PR titles should say "fix problem foo", not "fix issue #1234". References to issues can be added to the PR description and / or commit messages.

@fjmolinas fjmolinas changed the title stm32l152re: fix issue 11820 stm32l152re: fix hardfault when DBGMCU_CR_DBG* bits are set and branch after __WFI() Jul 17, 2019
@fjmolinas
Copy link
Contributor Author

I'm going to split this PR into two, I got some comment on my PR upstream regarding the openocd changes.

fjmolinas added a commit to fjmolinas/RIOT that referenced this pull request Aug 5, 2019
- The __NOP() that was added in RIOT-OS#8518 is now remooved.
- When DBG_STANDBY, DBG_STOP or DBG_SLEEP are set in DBG_CR a hardfault
  occurs on wakeup from sleep. This was first diagnosed in RIOT-OS#8518. When
  enabled, a hardfault occured when returning from a branch to irq_restore()
  we avoid the call by inlining the function call. See RIOT-OS#11830 for more
  details.
@benpicco
Copy link
Contributor

benpicco commented Feb 6, 2020

Did the other PR resolve this already?

@fjmolinas
Copy link
Contributor Author

Did the other PR resolve this already?

Part of it, I was trying to resolve the rest upstream, but it staled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: pm Area: (Low) power management Platform: ARM Platform: This PR/issue effects ARM-based platforms Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stm32l152re: hard-fault unless power-cycled after flash, or depending on optimization
5 participants