Skip to content

Conversation

daniel-k
Copy link
Member

@daniel-k daniel-k commented Oct 1, 2015

When the hardfault handler executes it may case an overflow into the preceeding section in memory (bss) caused by printf. The already existent measure only make sure not leave valid ram, not leaving the own stack.

This PR will output a warning if the hardfault handler may have corrupted memory and display the corrupt range. If you happen to see this on your board it should be an indication that the handler stack is too small and raise the awareness that debugging the current state may be pointless.

@daniel-k daniel-k added Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation Platform: ARM Platform: This PR/issue effects ARM-based platforms labels Oct 1, 2015
uint32_t* sp;
asm volatile ("mov %[sp], sp" : [sp] "=r" (sp) : : );
/* If printf may overflow the handler stack */
if( (sp - &_sstack) < required) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you change this to

if (((uint32_t)sp) < (((uint32_t)&_sstack) + required)) {
}

you will also catch if sp has already passed _sstack

@daniel-k
Copy link
Member Author

daniel-k commented Oct 2, 2015

@gebart
I just measured the stack usage of the hard fault handler. It's about 350 bytes for Cortex M0 (not sure if M3/M4 would differ).

Another possibility would be to check for the stack canary after the handler has finished, got that idea just now. What do you think?

@jnohlgard
Copy link
Member

@daniel-k How much does this currently add to the binary? Did you run a make info-buildsizes-diff yet for the cortex platforms?

It is a useful debugging tool to be able to check if the stack has been corrupted, but I think the hardfault handler is already becoming quite large and there are very few cases where it would make any difference rather than just comparing the stack pointer with the limits of the stack. You can also use the ps command to check the stacks.

@daniel-k
Copy link
Member Author

daniel-k commented Oct 3, 2015

@gebart

How much does this currently add to the binary? Did you run a make info-buildsizes-diff yet for the cortex platforms?

No, I haven't yet.

[...] the hardfault handler is already becoming quite large [...]

That's true though.

[...] there are very few cases where it would make any difference rather than just comparing the stack pointer with the limits of the stack. You can also use the ps command to check the stacks.

You mean "checking for a canary" by that? The problem, at least on samr21-xpro, at the moment is, that an overflowing ISR stack corrupts scheduler/threading related system variables and (maybe therefore) call ps() in GDB crashes the system. If the hardfault happened somewhere in an ISR (especially inside SVC interrupt / sched_run()) the ISR stack is already used up quite a lot. Running printf then will corrupt the state, that also counts for ps.

@daniel-k daniel-k force-pushed the pr/cortexm_hardfault_overflow branch from 9156668 to c16e430 Compare October 9, 2015 15:10
@daniel-k
Copy link
Member Author

daniel-k commented Oct 9, 2015

@gebart
Updated this PR. If you don't check the stack canary on entry there's no possibility to tell if the ISR stack has overflowed previously. IMO that's an important information because this may be the cause for the hard fault in the first place.

Diff against master introduces an overhead of 136 bytes in .text.

@jnohlgard jnohlgard added CI: needs squashing Commits in this PR need to be squashed; If set, CI systems will mark this PR as unmergable CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Oct 12, 2015
@jnohlgard
Copy link
Member

ACK from me, but I would like another opinion on the increased complexity of the hardfault handler. @OlegHahm, @kaspar030, @haukepetersen maybe?

Note: This PR only affects the -DDEVELHELP hardfault handler

@haukepetersen
Copy link
Contributor

I am ok with this change as it is not affecting anything with disabled DEVELHELP flag...

@jnohlgard
Copy link
Member

@daniel-k what happened to the contents of daniel-k@c31929d ?

@daniel-k
Copy link
Member Author

@daniel-k what happened to the contents of daniel-k@c31929d ?

I refactored this into int _stack_size_left(uint32_t required) that I use twice now. Looks cleaner now IMO.

@jnohlgard
Copy link
Member

Agree it looks cleaner, though you only use it once as far as I can see in the files changed tab?

@daniel-k
Copy link
Member Author

@gebart
You're right. c16e430 removed the second occurence again 😄

@@ -246,6 +261,10 @@ __attribute__((used)) void hard_fault_handler(uint32_t* sp, uint32_t corrupted,
printf("EXC_RET: 0x%08" PRIx32 "\n", exc_return);
puts("Attempting to reconstruct state for debugging...");
printf("In GDB:\n set $pc=0x%lx\n frame 0\n bt\n", pc);
int stack_left = _stack_size_left(HARDFAULT_HANDLER_REQUIRED_STACK_SPACE);
if(stack_left < 0) {
printf("\nISR stack overflowed by %d bytes max.\n", (-1 * stack_left));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that it has overflowed by at least %d bytes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. Now that I measured the stack usage, this should be changed to at least.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, isn't it the other way around? Now that the stack usage refering to the entry pointer is known and given that the stack pointer has probably advanced (at least the stack usage didn't decline), I can tell that max. n bytes overflowed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the stack has overflowed in the past we can not know for sure how far it has overflowed before shrinking to the current sp.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. I was implicitly assuming that a stack overflow would cause hardfault and that it would happen at the point with the highest stack usage. But that obviously doesn't hold. I this case _stack_size_left() should be called at the entry of hard_fault_handler() I guess.

@jnohlgard
Copy link
Member

ACK, please squash

@daniel-k daniel-k force-pushed the pr/cortexm_hardfault_overflow branch from 8c52a7a to c5e220c Compare October 27, 2015 14:58
@daniel-k daniel-k removed the CI: needs squashing Commits in this PR need to be squashed; If set, CI systems will mark this PR as unmergable label Oct 27, 2015
@PeterKietzmann
Copy link
Member

I'm gonna hit the button for Joakim. And go

PeterKietzmann added a commit that referenced this pull request Oct 27, 2015
cortexm_common: check for possible stack overflow in hardfault handler
@PeterKietzmann PeterKietzmann merged commit f5b2c80 into RIOT-OS:master Oct 27, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Platform: ARM Platform: This PR/issue effects ARM-based platforms Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants