Skip to content

zebra may lost interface ip when delete kernel interfaces and recreate them and set their ip in a short time #13630

@ExplorerNo9

Description

@ExplorerNo9

Describe the bug

If we delete kernel interface and create it and set its ip in a short time, in rare cases, interface ip will be lost in zebra which can be confirmed by vtysh show interface brirf command. This will lead to abnormal behavior of other protocol daemons, for example, bgpd does not announce the route corresponding to interface ip even it was specified by network command.

  • Did you check if this is a duplicate issue?
  • Did you test it on the latest FRRouting/frr master branch?

Versions

  • OS Version: Debian 11
  • Kernel: Linux 5.10
  • FRR Version: 8.2

To Reproduce

  1. Prepare script below for test
run_test_intf_ip.sh  

#!/bin/bash
#The problem only happens in very few cases so we add number of interfaces to increase possibility of reproducing
num=180
for((i=1; i<=num; i++))
do
    ip link del dev test$i
done

for((i=1; i<=num; i++))
do
    # The problem is observed on dummy interface. Haven't test on other types.
    ip link add dev test$i type dummy && ip link set dev test$i up
    ip addr add 133.0.$i.1/24 dev test$i
done
  1. Open zebra kernel log by debug zebra kernel and log stdout debugging
  2. Execute sudo ./run_test_intf_ip.sh
  3. Watch the log and wait for zebra done. Then check if zebra lost ip of any test interface by show interface brief. If none, repeat step 3

** Analysis**
Here is a part of zebra log when the problem happened on interface Loopback0.

Click to see zebra log

ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=1344, seq=0, pid=0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWROUTE(24), len=116, seq=0, pid=0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWROUTE(24), len=116, seq=0, pid=0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWADDR(20), len=72, seq=0, pid=0
ZEBRA: [RGWF1-EHXT1] netlink_interface_addr_dplane: RTM_NEWADDR nsid 0 ifindex 256 flags 0x80:
ZEBRA: [ME3M2-X6YT9]   IFA_ADDRESS   fe80::bc89:d6ff:fe37:77af/64
ZEBRA: [P2VPT-508WP]   IFA_CACHEINFO pref -1, valid -1
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWROUTE(24), len=116, seq=0, pid=0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWROUTE(24), len=116, seq=0, pid=0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=1344, seq=0, pid=0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWADDR(20), len=84, seq=1684932606, pid=34938553
ZEBRA: [RGWF1-EHXT1] netlink_interface_addr_dplane: RTM_NEWADDR nsid 0 ifindex 256 flags 0x80:
ZEBRA: [XMC8C-4ZFJ9]   IFA_LOCAL     10.1.0.228/32
ZEBRA: [ME3M2-X6YT9]   IFA_ADDRESS   10.1.0.228/32
ZEBRA: [Y9HR3-XD5TG]   IFA_LABEL     Loopback0
ZEBRA: [P2VPT-508WP]   IFA_CACHEINFO pref -1, valid -1
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWROUTE(24), len=60, seq=0, pid=0
ZEBRA: [Q9CEC-J9KWY] zebra_if_addr_update_ctx: can't find ifp at nsid 0 index 256
---
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWLINK(16), len=1344, seq=0, pid=0
ZEBRA: [NAV05-EY6FH] RTM_NEWLINK ADD for Loopback0(256) vrf_id 0 type 0 sl_type 0 master 0 flags 0x82
ZEBRA: [ZAG0W-VSNSD] interface Loopback0 vrf default(0) index 256 becomes active.
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWLINK(16), len=1344, seq=0, pid=0
ZEBRA: [W6BZR-YZPAB] RTM_NEWLINK update for Loopback0(256) sl_type 0 master 0 flags 0x100c3
ZEBRA: [N7FN2-J93A7] Intf Loopback0(256) has come UP
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWROUTE(24), len=116, seq=0, pid=0
ZEBRA: [SKNFJ-G938V] RTM_NEWROUTE ipv6 multicast proto kernel NS 0
ZEBRA: [Q3MY3-G3YNJ] MCAST VRF: default(0) RTM_NEWROUTE (0.0.0.0,255.0.0.0) IIF: Unknown(0) OIF:  jiffies: 0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWROUTE(24), len=116, seq=0, pid=0
ZEBRA: [SKNFJ-G938V] RTM_NEWROUTE ipv6 unicast proto kernel NS 0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWADDR(20), len=72, seq=0, pid=0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWROUTE(24), len=116, seq=0, pid=0
ZEBRA: [SKNFJ-G938V] RTM_NEWROUTE ipv6 local proto kernel NS 0
ZEBRA: [J3J81-V75NW] Route rtm_type: local(2) intentionally ignoring
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWROUTE(24), len=116, seq=0, pid=0
ZEBRA: [SKNFJ-G938V] RTM_NEWROUTE ipv6 anycast proto kernel NS 0
ZEBRA: [J3J81-V75NW] Route rtm_type: anycast(4) intentionally ignoring
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWLINK(16), len=1344, seq=0, pid=0
ZEBRA: [W6BZR-YZPAB] RTM_NEWLINK update for Loopback0(256) sl_type 0 master 0 flags 0x102c3
ZEBRA: [P48K1-574RY] Intf Loopback0(256) PTM up, notifying clients
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWADDR(20), len=84, seq=1684932606, pid=3493855318
We can see that it is because zebra process RTM_NEWADDR from dp-netlink-in **earlier than** RTM_NEWLINK from netlink-listen.

But I don't know why would this happen sometimes and why it wouldn't happen normally. And the most important, is there any way to prevent this?
This problem comes from PR#9052 so may I ask for your help? @mjstapp

Metadata

Metadata

Assignees

No one assigned

    Labels

    triageNeeds further investigation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions