Skip to content

Conversation

benpicco
Copy link
Contributor

@benpicco benpicco commented May 31, 2022

Contribution description

The _get_free_entry() accesses a static buffer, so we must protect it from concurrent access to avoid corruption.

Testing procedure

Run the examples/gnrc_border_router on e.g. a nrf52840dk with another node in the network, in my case an avr-rss2 which got the address 2001:db8::fec2:3d00:0:bb1.

Now start two pings in parallel on your host machine:

ping -A 2001:db8::fec2:3d00:0:bb1 -s 512
sudo ping -f 2001:db8::fec2:3d00:0:bb1

On master this will crash & burn:

gnrc_netif: can't queue packet for sending
gnrc_netif: can't queue packet for sending
gnrc_netif: can't queue packet for sending
gnrc_netif: can't queue packet for sending
gnrc_netif: can't queue packet for sending
gnrc_netif: can't queue packet for sending
gnrc_netif: can't queue packet for sending
Stack pointer corrupted, reset to top of stack
FSR/FAR:
 CFSR: 0x00000092
 HFSR: 0x40000000
 DFSR: 0x00000000
 AFSR: 0x00000000
MMFAR: 0x20000018
Misc
EXC_RET: 0xfffffff1
----> ethos: hello received

With this patch we also run out of queue space, but we manage to recover when we stop the flood ping.

gnrc_netif: can't queue packet for sending
gnrc_netif: can't queue packet for sending
gnrc_netif: can't queue packet for sending
gnrc_netif: can't queue packet for sending
gnrc_netif: can't queue packet for sending

> ping -s 512 ff02::1
520 bytes from fe80::7837:fcff:fe7d:1aae%5: icmp_seq=0 ttl=64 time=107.856 ms
520 bytes from fe80::7837:fcff:fe7d:1aae%5: icmp_seq=1 ttl=64 time=114.955 ms
520 bytes from fe80::7837:fcff:fe7d:1aae%5: icmp_seq=2 ttl=64 time=107.774 ms

--- ff02::1 PING statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 107.774/110.195/114.955 ms

Issues/PRs references

partially fixes #17924, when doing a sudo ping -f 2001:db8::fec2:3d00:0:bb1 -s 512 the upstream interface remains 'dead' for several seconds after the flood ping was stopped.

@benpicco benpicco requested a review from jia200x May 31, 2022 21:48
@github-actions github-actions bot added Area: network Area: Networking Area: sys Area: System labels May 31, 2022
@benpicco benpicco added the Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) label May 31, 2022
@benpicco benpicco force-pushed the gnrc_netif_pktq-fix branch from 22b4b53 to 51cb364 Compare May 31, 2022 22:33
@benpicco benpicco added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label May 31, 2022
Copy link
Member

@miri64 miri64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mutex should be initialized, otherwise, the change looks good.

@benpicco benpicco force-pushed the gnrc_netif_pktq-fix branch from 51cb364 to 61cc168 Compare June 1, 2022 10:13
@benpicco
Copy link
Contributor Author

benpicco commented Jun 1, 2022

btw: What's the reason this pool is not shared with the packet queue for address resolution (CONFIG_GNRC_IPV6_NIB_QUEUE_PKT)?

@miri64
Copy link
Member

miri64 commented Jun 1, 2022

I trust your testing and from the code this should also work fine.

@miri64 miri64 enabled auto-merge June 1, 2022 10:17
@miri64
Copy link
Member

miri64 commented Jun 1, 2022

btw: What's the reason this pool is not shared with the packet queue for address resolution (CONFIG_GNRC_IPV6_NIB_QUEUE_PKT)?

Changing your mind? #17680 (comment) ;-)

@benpicco
Copy link
Contributor Author

benpicco commented Jun 1, 2022

I think it's different though. Unlike cache entries, those queue entries are short lived, when they are full we will lose packets either way.

And if we queue the packet because we have to do address resolution first or because the interface is busy doesn't seem like much of a difference conceptually. If we have many packets queued for address resolution we would also have a busy interface as a result, and if our interface is too busy it would also drop the address resolution packets.

@miri64 miri64 merged commit 18edd89 into RIOT-OS:master Jun 1, 2022
@benpicco benpicco deleted the gnrc_netif_pktq-fix branch June 1, 2022 21:19
@chrysn chrysn added this to the Release 2022.07 milestone Aug 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: network Area: Networking Area: sys Area: System CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gnrc_netif_pktq leaks memory
3 participants