cortexm_common: check for possible stack overflow in hardfault handler #4015

daniel-k · 2015-10-01T16:03:52Z

When the hardfault handler executes it may case an overflow into the preceeding section in memory (bss) caused by printf. The already existent measure only make sure not leave valid ram, not leaving the own stack.

This PR will output a warning if the hardfault handler may have corrupted memory and display the corrupt range. If you happen to see this on your board it should be an indication that the handler stack is too small and raise the awareness that debugging the current state may be pointless.

jnohlgard · 2015-10-01T19:12:11Z

cpu/cortexm_common/vectors_cortexm.c

+    uint32_t* sp;
+    asm volatile ("mov %[sp], sp" : [sp] "=r" (sp) : : );
+    /* If printf may overflow the handler stack */
+    if( (sp - &_sstack) < required) {


if you change this to

if (((uint32_t)sp) < (((uint32_t)&_sstack) + required)) { }

you will also catch if sp has already passed _sstack

daniel-k · 2015-10-02T08:53:05Z

@gebart
I just measured the stack usage of the hard fault handler. It's about 350 bytes for Cortex M0 (not sure if M3/M4 would differ).

Another possibility would be to check for the stack canary after the handler has finished, got that idea just now. What do you think?

jnohlgard · 2015-10-03T08:17:58Z

@daniel-k How much does this currently add to the binary? Did you run a make info-buildsizes-diff yet for the cortex platforms?

It is a useful debugging tool to be able to check if the stack has been corrupted, but I think the hardfault handler is already becoming quite large and there are very few cases where it would make any difference rather than just comparing the stack pointer with the limits of the stack. You can also use the ps command to check the stacks.

daniel-k · 2015-10-03T12:36:10Z

@gebart

How much does this currently add to the binary? Did you run a make info-buildsizes-diff yet for the cortex platforms?

No, I haven't yet.

[...] the hardfault handler is already becoming quite large [...]

That's true though.

[...] there are very few cases where it would make any difference rather than just comparing the stack pointer with the limits of the stack. You can also use the ps command to check the stacks.

You mean "checking for a canary" by that? The problem, at least on samr21-xpro, at the moment is, that an overflowing ISR stack corrupts scheduler/threading related system variables and (maybe therefore) call ps() in GDB crashes the system. If the hardfault happened somewhere in an ISR (especially inside SVC interrupt / sched_run()) the ISR stack is already used up quite a lot. Running printf then will corrupt the state, that also counts for ps.

daniel-k · 2015-10-09T15:17:23Z

@gebart
Updated this PR. If you don't check the stack canary on entry there's no possibility to tell if the ISR stack has overflowed previously. IMO that's an important information because this may be the cause for the hard fault in the first place.

Diff against master introduces an overhead of 136 bytes in .text.

jnohlgard · 2015-10-12T20:09:04Z

ACK from me, but I would like another opinion on the increased complexity of the hardfault handler. @OlegHahm, @kaspar030, @haukepetersen maybe?

Note: This PR only affects the -DDEVELHELP hardfault handler

haukepetersen · 2015-10-21T14:37:53Z

I am ok with this change as it is not affecting anything with disabled DEVELHELP flag...

jnohlgard · 2015-10-21T15:27:15Z

@daniel-k what happened to the contents of daniel-k@c31929d ?

daniel-k · 2015-10-27T12:18:47Z

@daniel-k what happened to the contents of daniel-k@c31929d ?

I refactored this into int _stack_size_left(uint32_t required) that I use twice now. Looks cleaner now IMO.

jnohlgard · 2015-10-27T13:20:45Z

Agree it looks cleaner, though you only use it once as far as I can see in the files changed tab?

daniel-k · 2015-10-27T13:23:24Z

@gebart
You're right. c16e430 removed the second occurence again 😄

jnohlgard · 2015-10-27T13:26:13Z

cpu/cortexm_common/vectors_cortexm.c

@@ -246,6 +261,10 @@ __attribute__((used)) void hard_fault_handler(uint32_t* sp, uint32_t corrupted,
        printf("EXC_RET: 0x%08" PRIx32 "\n", exc_return);
        puts("Attempting to reconstruct state for debugging...");
        printf("In GDB:\n  set $pc=0x%lx\n  frame 0\n  bt\n", pc);
+        int stack_left = _stack_size_left(HARDFAULT_HANDLER_REQUIRED_STACK_SPACE);
+        if(stack_left < 0) {
+            printf("\nISR stack overflowed by %d bytes max.\n", (-1 * stack_left));


I'd argue that it has overflowed by at least %d bytes

That's correct. Now that I measured the stack usage, this should be changed to at least.

Wait, isn't it the other way around? Now that the stack usage refering to the entry pointer is known and given that the stack pointer has probably advanced (at least the stack usage didn't decline), I can tell that max. n bytes overflowed.

if the stack has overflowed in the past we can not know for sure how far it has overflowed before shrinking to the current sp.

I see your point. I was implicitly assuming that a stack overflow would cause hardfault and that it would happen at the point with the highest stack usage. But that obviously doesn't hold. I this case _stack_size_left() should be called at the entry of hard_fault_handler() I guess.

jnohlgard · 2015-10-27T14:57:00Z

ACK, please squash

PeterKietzmann · 2015-10-27T18:19:29Z

I'm gonna hit the button for Joakim. And go

cortexm_common: check for possible stack overflow in hardfault handler

daniel-k assigned jnohlgard Oct 1, 2015

daniel-k added Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation Platform: ARM Platform: This PR/issue effects ARM-based platforms labels Oct 1, 2015

jnohlgard reviewed Oct 1, 2015
View reviewed changes

daniel-k force-pushed the pr/cortexm_hardfault_overflow branch from 9156668 to c16e430 Compare October 9, 2015 15:10

daniel-k mentioned this pull request Oct 12, 2015

Simple duty cycling 802.15.4 MAC protocol #3730

Closed

jnohlgard added CI: needs squashing Commits in this PR need to be squashed; If set, CI systems will mark this PR as unmergable CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Oct 12, 2015

jnohlgard reviewed Oct 27, 2015
View reviewed changes

cortexm_common: check for possible stack overflow in hardfault handler

c5e220c

daniel-k force-pushed the pr/cortexm_hardfault_overflow branch from 8c52a7a to c5e220c Compare October 27, 2015 14:58

daniel-k removed the CI: needs squashing Commits in this PR need to be squashed; If set, CI systems will mark this PR as unmergable label Oct 27, 2015

PeterKietzmann added a commit that referenced this pull request Oct 27, 2015

Merge pull request #4015 from daniel-k/pr/cortexm_hardfault_overflow

f5b2c80

cortexm_common: check for possible stack overflow in hardfault handler

PeterKietzmann merged commit f5b2c80 into RIOT-OS:master Oct 27, 2015

cortexm_common: check for possible stack overflow in hardfault handler #4015

cortexm_common: check for possible stack overflow in hardfault handler #4015

Uh oh!

Conversation

daniel-k commented Oct 1, 2015

Uh oh!

jnohlgard Oct 1, 2015

Choose a reason for hiding this comment

Uh oh!

daniel-k commented Oct 2, 2015

Uh oh!

jnohlgard commented Oct 3, 2015

Uh oh!

daniel-k commented Oct 3, 2015

Uh oh!

daniel-k commented Oct 9, 2015

Uh oh!

jnohlgard commented Oct 12, 2015

Uh oh!

haukepetersen commented Oct 21, 2015

Uh oh!

jnohlgard commented Oct 21, 2015

Uh oh!

daniel-k commented Oct 27, 2015

Uh oh!

jnohlgard commented Oct 27, 2015

Uh oh!

daniel-k commented Oct 27, 2015

Uh oh!

jnohlgard Oct 27, 2015

Choose a reason for hiding this comment

Uh oh!

daniel-k Oct 27, 2015

Choose a reason for hiding this comment

Uh oh!

daniel-k Oct 27, 2015

Choose a reason for hiding this comment

Uh oh!

jnohlgard Oct 27, 2015

Choose a reason for hiding this comment

Uh oh!

daniel-k Oct 27, 2015

Choose a reason for hiding this comment

Uh oh!

jnohlgard commented Oct 27, 2015

Uh oh!

PeterKietzmann commented Oct 27, 2015

Uh oh!

Uh oh!