-
Notifications
You must be signed in to change notification settings - Fork 1.4k
lib: delay the free of vty pointer upon vty_close() #19040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib: delay the free of vty pointer upon vty_close() #19040
Conversation
3ec1edc
to
3b64ef3
Compare
ci:rerun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this problem isn't specific to bgp in any way, is it? that is, any frr code that makes more than one output call could encounter this error, couldn't it?
this approach seems to require every possible code path to look for an error condition - and any path that doesn't do that is exposed to a problem. would it be possible to prevent vty_out() and friends from acting badly if the connection is closed?
Hi Mark. In a pinch, BGP patch is not really needed, since now, vty_out() is protected: erroneous I/O lead to create an event to handle the vty_close() call. Today, all the show can not be interrupted. This BGP patch is here to avoid spending 10s of seconds writing a full route into an erroneous socket.
This needs to be checked on all calls to vty_close(), I think. |
@@ -12471,6 +12471,12 @@ static int bgp_show_table(struct vty *vty, struct bgp *bgp, afi_t afi, safi_t sa | |||
else | |||
json_paths = NULL; | |||
|
|||
if (vty->status == VTY_CLOSE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This memleaks json_paths.
Mark is correct, this modification requires that every single code path that uses vty become conversant with this error condition happening and as I point out it creates a whole another level of worry about leaked memory that needs to be taken care of. I agree with Mark's proposal. This is a problem with the vty subsystem and exposing this code change outside of vty.c just opens up a whole can of worms associated with changing every place that uses vty_out(). |
3b64ef3
to
8c442a6
Compare
8c442a6
to
3094f07
Compare
9efd996
to
49b9d44
Compare
If the 'show bgp ipv4 vpn' command is interrupted during the dump of the whole entries, then a heap use after free may appear. > May 15 16:08:14 vRR-DUT bgpd[1647]: [N9HHH-F8H1M] %ADJCHANGE: neighbor 3.3.1.232(taastpap-scale-1) in vrf default Up > May 15 16:08:14 vRR-DUT bgpd[1647]: [GJPH1-W8PZV] Resetting peer 3.3.1.232 due to change in addpath config > May 15 16:08:14 vRR-DUT bgpd[1647]: [YAF85-253AP][EC 100663299] buffer_flush_available: write error on fd 70: Broken pipe > May 15 16:08:14 vRR-DUT bgpd[1647]: [THHDB-YPEY6][EC 100663299] vtysh_flush: write error to fd 70, closing > May 15 16:08:14 vRR-DUT bgpd[1647]: ================================================================= > May 15 16:08:14 vRR-DUT bgpd[1647]: ==1647==ERROR: AddressSanitizer: heap-use-after-free on address 0x62b001d65238 at pc 0x7efd46b83fa0 bp 0x7ffd4d598> > May 15 16:08:14 vRR-DUT bgpd[1647]: READ of size 1 at 0x62b001d65238 thread T0 > May 15 16:08:14 vRR-DUT bgpd[1647]: #0 0x7efd46b83f9f in vty_out (/lib/x86_64-linux-gnu/libfrr.so.0+0x383f9f) > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#1 0x5570c0a4e7d2 in aspath_print_vty /build/make-pkg/output/_packages/cp-routing/src/bgpd/bgp_aspath.c:2308 > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#2 0x5570c08a8ce6 in route_vty_out /build/make-pkg/output/_packages/cp-routing/src/bgpd/bgp_route.c:9565 > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#3 0x5570c08b5b65 in bgp_show_table /build/make-pkg/output/_packages/cp-routing/src/bgpd/bgp_route.c:11751 > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#4 0x5570c08b677f in bgp_show_table_rd /build/make-pkg/output/_packages/cp-routing/src/bgpd/bgp_route.c:11915 > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#5 0x5570c08b6d1c in bgp_show /build/make-pkg/output/_packages/cp-routing/src/bgpd/bgp_route.c:11964 > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#6 0x5570c08bcb03 in show_ip_bgp_magic /build/make-pkg/output/_packages/cp-routing/src/bgpd/bgp_route.c:13134 > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#7 0x5570c086f616 in show_ip_bgp bgpd/bgp_route_clippy.c:514 > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#8 0x7efd469b38b6 (/lib/x86_64-linux-gnu/libfrr.so.0+0x1b38b6) > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#9 0x7efd469b3c43 in cmd_execute_command (/lib/x86_64-linux-gnu/libfrr.so.0+0x1b3c43) > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#10 0x7efd469b4915 in cmd_execute (/lib/x86_64-linux-gnu/libfrr.so.0+0x1b4915) > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#11 0x7efd46b8581a (/lib/x86_64-linux-gnu/libfrr.so.0+0x38581a) > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#12 0x7efd46b8a6ff (/lib/x86_64-linux-gnu/libfrr.so.0+0x38a6ff) > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#13 0x7efd46b90f90 (/lib/x86_64-linux-gnu/libfrr.so.0+0x390f90) > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#14 0x7efd46b75257 in event_call (/lib/x86_64-linux-gnu/libfrr.so.0+0x375257) > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#15 0x7efd46a42ecf in frr_run (/lib/x86_64-linux-gnu/libfrr.so.0+0x242ecf) > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#16 0x5570c06bf804 in main /build/make-pkg/output/_packages/cp-routing/src/bgpd/bgp_main.c:545 > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#17 0x7efd46429d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#18 0x7efd46429e3f in __libc_start_main_impl ../csu/libc-start.c:392 > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#19 0x5570c06bc384 in _start (/usr/bin/bgpd+0x272384) > May 15 16:08:14 vRR-DUT bgpd[1647]: 0x62b001d65238 is located 56 bytes inside of 26536-byte region [0x62b001d65200,0x62b001d6b9a8) > May 15 16:08:14 vRR-DUT bgpd[1647]: freed by thread T0 here: > May 15 16:08:14 vRR-DUT bgpd[1647]: #0 0x7efd470b4537 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127 > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#1 0x7efd46a6f930 in qfree (/lib/x86_64-linux-gnu/libfrr.so.0+0x26f930) > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#2 0x7efd46b92467 in vty_close (/lib/x86_64-linux-gnu/libfrr.so.0+0x392467) > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#3 0x7efd46b8fd4c (/lib/x86_64-linux-gnu/libfrr.so.0+0x38fd4c) > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#4 0x7efd46b83f60 in vty_out (/lib/x86_64-linux-gnu/libfrr.so.0+0x383f60) > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#5 0x5570c0a4e7d2 in aspath_print_vty /build/make-pkg/output/_packages/cp-routing/src/bgpd/bgp_aspath.c:2308 > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#6 0x5570c08a8ce6 in route_vty_out /build/make-pkg/output/_packages/cp-routing/src/bgpd/bgp_route.c:9565 > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#7 0x5570c08b5b65 in bgp_show_table /build/make-pkg/output/_packages/cp-routing/src/bgpd/bgp_route.c:11751 > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#8 0x5570c08b677f in bgp_show_table_rd /build/make-pkg/output/_packages/cp-routing/src/bgpd/bgp_route.c:11915 > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#9 0x5570c08b6d1c in bgp_show /build/make-pkg/output/_packages/cp-routing/src/bgpd/bgp_route.c:11964 > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#10 0x5570c08bcb03 in show_ip_bgp_magic /build/make-pkg/output/_packages/cp-routing/src/bgpd/bgp_route.c:13134 > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#11 0x5570c086f616 in show_ip_bgp bgpd/bgp_route_clippy.c:514 > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#12 0x7efd469b38b6 (/lib/x86_64-linux-gnu/libfrr.so.0+0x1b38b6) > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#13 0x7efd469b3c43 in cmd_execute_command (/lib/x86_64-linux-gnu/libfrr.so.0+0x1b3c43) > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#14 0x7efd469b4915 in cmd_execute (/lib/x86_64-linux-gnu/libfrr.so.0+0x1b4915) > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#15 0x7efd46b8581a (/lib/x86_64-linux-gnu/libfrr.so.0+0x38581a) > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#16 0x7efd46b8a6ff (/lib/x86_64-linux-gnu/libfrr.so.0+0x38a6ff) > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#17 0x7efd46b90f90 (/lib/x86_64-linux-gnu/libfrr.so.0+0x390f90) > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#18 0x7efd46b75257 in event_call (/lib/x86_64-linux-gnu/libfrr.so.0+0x375257) > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#19 0x7efd46a42ecf in frr_run (/lib/x86_64-linux-gnu/libfrr.so.0+0x242ecf) > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#20 0x5570c06bf804 in main /build/make-pkg/output/_packages/cp-routing/src/bgpd/bgp_main.c:545 > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#21 0x7efd46429d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 > May 15 16:08:14 vRR-DUT bgpd[1647]: previously allocated by thread T0 here: > May 15 16:08:14 vRR-DUT bgpd[1647]: #0 0x7efd470b4a57 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154 > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#1 0x7efd46a6f7de in qcalloc (/lib/x86_64-linux-gnu/libfrr.so.0+0x26f7de) > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#2 0x7efd46b8c5ba in vty_new (/lib/x86_64-linux-gnu/libfrr.so.0+0x38c5ba) > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#3 0x7efd46b8f0fe (/lib/x86_64-linux-gnu/libfrr.so.0+0x38f0fe) > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#4 0x7efd46b75257 in event_call (/lib/x86_64-linux-gnu/libfrr.so.0+0x375257) > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#5 0x7efd46a42ecf in frr_run (/lib/x86_64-linux-gnu/libfrr.so.0+0x242ecf) > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#6 0x5570c06bf804 in main /build/make-pkg/output/_packages/cp-routing/src/bgpd/bgp_main.c:545 > May 15 16:08:14 vRR-DUT bgpd[1647]: FRRouting#7 0x7efd46429d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 > May 15 16:08:14 vRR-DUT bgpd[1647]: SUMMARY: AddressSanitizer: heap-use-after-free (/lib/x86_64-linux-gnu/libfrr.so.0+0x383f9f) in vty_out > May 15 16:08:14 vRR-DUT bgpd[1647]: Shadow bytes around the buggy address: > May 15 16:08:14 vRR-DUT bgpd[1647]: 0x0c56803a49f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa > May 15 16:08:14 vRR-DUT bgpd[1647]: 0x0c56803a4a00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa > May 15 16:08:14 vRR-DUT bgpd[1647]: 0x0c56803a4a10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa > May 15 16:08:14 vRR-DUT bgpd[1647]: 0x0c56803a4a20: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa > May 15 16:08:14 vRR-DUT bgpd[1647]: 0x0c56803a4a30: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa > May 15 16:08:14 vRR-DUT bgpd[1647]: =>0x0c56803a4a40: fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd fd fd > May 15 16:08:14 vRR-DUT bgpd[1647]: 0x0c56803a4a50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd > May 15 16:08:14 vRR-DUT bgpd[1647]: 0x0c56803a4a60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd > May 15 16:08:14 vRR-DUT bgpd[1647]: 0x0c56803a4a70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd > May 15 16:08:14 vRR-DUT bgpd[1647]: 0x0c56803a4a80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd > May 15 16:08:14 vRR-DUT bgpd[1647]: 0x0c56803a4a90: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd > May 15 16:08:14 vRR-DUT bgpd[1647]: Shadow byte legend (one shadow byte represents 8 application bytes): > May 15 16:08:14 vRR-DUT bgpd[1647]: Addressable: 00 > May 15 16:08:14 vRR-DUT bgpd[1647]: Partially addressable: 01 02 03 04 05 06 07 > May 15 16:08:14 vRR-DUT bgpd[1647]: Heap left redzone: fa > May 15 16:08:14 vRR-DUT bgpd[1647]: Freed heap region: fd > May 15 16:08:14 vRR-DUT bgpd[1647]: Stack left redzone: f1 > May 15 16:08:14 vRR-DUT bgpd[1647]: Stack mid redzone: f2 > May 15 16:08:14 vRR-DUT bgpd[1647]: Stack right redzone: f3 > May 15 16:08:14 vRR-DUT bgpd[1647]: Stack after return: f5 > May 15 16:08:14 vRR-DUT bgpd[1647]: Stack use after scope: f8 > May 15 16:08:14 vRR-DUT bgpd[1647]: Global redzone: f9 > May 15 16:08:14 vRR-DUT bgpd[1647]: Global init order: f6 > May 15 16:08:14 vRR-DUT bgpd[1647]: Poisoned by user: f7 > May 15 16:08:14 vRR-DUT bgpd[1647]: Container overflow: fc > May 15 16:08:14 vRR-DUT bgpd[1647]: Array cookie: ac > May 15 16:08:14 vRR-DUT bgpd[1647]: Intra object redzone: bb > May 15 16:08:14 vRR-DUT bgpd[1647]: ASan internal: fe > May 15 16:08:14 vRR-DUT bgpd[1647]: Left alloca redzone: ca > May 15 16:08:14 vRR-DUT bgpd[1647]: Right alloca redzone: cb > May 15 16:08:14 vRR-DUT bgpd[1647]: Shadow gap: cc > May 15 16:08:14 vRR-DUT bgpd[1647]: ==1647==ABORTING The vty_out() function detects a BROKEN PIPE error during the call of vtysh_flush() call, and closes the vty pointer. Consequently, the next vty_out() calls may use an invalid vty pointer. The call to vtysh_flush() has been added in a previous commit: this call does a syncronous write for a large buffer, which triggers the error. Propose to delay the removal of the vty pointer at the next scheduling. Until that event, keep the vty structure at CLOSE state. The show function will eventually detect the invalid vty status, and will anticipate the end of the show command. Fixes: 9112fb3 ("lib: Memory spike reduction for sh cmds at scale") Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The 'show bgp <address-family>' command may take a long time when dumping a full route equivalent of entries. It is recommended to interrupt if the underlying vty session is closed. Propose to leave the cli handler by detecting it. The user experience will be better. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Declare vty_status enumerate, outside of the struct vty structure, to define the status. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Because vty_close() is called, it is not necessary to set the status before the call. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The VTY_CLOSE status is only set by vty_close(), vty_closed() functions. Avoid setting this value in other places of code. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The vty_close() call is performed, because in the end, the expectation is to close the vty structure. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Simplify code by avoiding calling vty_close, because it is already scheduled. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This API will be used when it will be possible to call the event library to flush the vty structure. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Simplify code so that one single call of vty_close() is done. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When leaving a process, the vty structure must absolutely be freed. Otherwise, memory leaks will happen. Identify all the cases by a comment marked above the call. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Delay the removal of the vty socket in vty library, except at the termination hooks, because the threads may be stopped too. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The closure of the vty session will be handled by an event thread that is scheduled at the end of the current thread. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The buffer_flush_available() may detect a socket error, and this may be a problem for next calls. Protect this call by sending a close event. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
49b9d44
to
971c1d8
Compare
CI:rerun |
Hmm. After looking at this for a few minutes, my initial thought is that we should add an But I haven't fully looked at this yet, this is only my preliminary thoughts after scrolling through this PR. |
Actually this looks like a regression from 9112fb3. That change is likely not correct, as I see it right now it is in general just straight up not permissible/possible to call |
#19109 hopefully reverts the underlying root cause so this shouldn't be needed anymore |
If the 'show bgp ipv4 vpn' command is interrupted during the dump of the whole entries, then a heap use after free may appear.
The vty_out() function detects a BROKEN PIPE error durint the call of vty_flush(), and closes the vty pointer. Consequently, the next vty_out() calls may use an invalid vty pointer.
Propose to delay the removal of the vty pointer at the next scheduling. Until that event, keep the vty structure at CLOSE state. The show function will eventually detect the invalid vty status, and will anticipate the end of the show command.