idpf: properly handle XDP program unloading during system reboot #28

michalQb · 2025-06-12T09:28:37Z

During a system reboot, the pci_device_shutdown() function is invoked before the network device is unregistered. However, the XDP program is unloaded as part of the netdev unregistration process—by which point most of the device's features have already been stopped.

Previously, the IDPF_REMOVE_IN_PROG flag was used to detect device removal and handle XDP unloading early. However, this flag does not cover the netdev unregistration path during shutdown.

This patch improves detection of shutdown or module removal scenarios by checking both IDPF_REMOVE_IN_PROG and IDPF_VPORT_REG_NETDEV flags. This ensures the XDP program is unloaded early and cleanly, avoiding unnecessary system reconfiguration.

Change EXPORT_SYMBOL_NS_GPL(x, "LIBETH") to EXPORT_SYMBOL_GPL(x) + DEFAULT_SYMBOL_NAMESPACE "LIBETH" to make the code more compact. Also, explicitly include <linux/export.h> to satisfy new requirements from scripts/misc-check. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Back when the libeth Rx core was initially written, devmem was a draft and netmem_ref didn't exist in the mainline. Now that it's here, make libeth MP-agnostic before introducing any new code or any new library users. When it's known that the created PP/FQ is for header buffers, use faster "unsafe" underscored netmem <--> virt accessors as netmem_is_net_iov() is always false in that case, but consumes some cycles (bit test + true branch). Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Expand libeth's Page Pool functionality by adding native XDP support. This means picking the appropriate headroom and DMA direction. Also, register all the created &page_pools as XDP memory models. A driver then can call xdp_rxq_info_attach_page_pool() when registering its RxQ info. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Start adding XDP-specific code to libeth, namely handling XDP_TX buffers (only sending). The idea is that we accumulate up to 16 buffers on the stack, then, if either the limit is reached or the polling is finished, flush them at once with only one XDPSQ cleaning (if needed). The main sending function will be aware of the sending budget and already have all the info to send the buffers, so it can't fail. Drivers need to provide 2 inline callbacks to the main sending function: for cleaning an XDPSQ and for filling descriptors; the library code takes care of the rest. Note that unlike the generic code, multi-buffer support is not wrapped here with unlikely() to not hurt header split setups. &libeth_xdp_buff is a simple extension over &xdp_buff which has a direct pointer to the corresponding Rx descriptor (and, luckily, precisely 1 CL size and 16-byte alignment on x86_64). Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> # xmit logic Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

@prep

Add helpers for implementing .ndo_xdp_xmit(). Same as for XDP_TX, accumulate up to 16 DMA-mapped frames on the stack, then flush. If DMA mapping is failed for some reason, don't try mapping further frames, but still flush what was already prepared. DMA address of a head frame is stored in its headroom, assuming it has enough of it for an 8 (or 4) byte value. In addition to @prep and @xmit driver callbacks in XDP_TX, xmit also needs @finalize to kick the XDPSQ after filling. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Similarly to libeth_tx_complete(), add libeth_xdp_complete_tx() to handle XDP_TX and xmit buffers. Both use bulk return under the hood. Also add out of line libeth_tx_complete_any() which handles both regular and XDP frames (if libeth_xdp is loaded), for example, to call on queue destroy, where we don't need inlining but convenience. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Unfortunately, it's not always possible to allocate max(num_rxqs, nr_cpu_ids) even on hi-end NICs. To mitigate this, add simple locking helpers to libeth_xdp. As long as XDPSQs are not shared, the whole functionality is gated behind a static lock. Otherwise, each bulk flush locks the queue for the time of cleaning and filling the descriptors. As long as this particular queue is not used by more than 1 CPU, the impact is minimal (runtime check for boolean twice per 16+ descriptors). Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> # static key Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

When XDP Tx queues are not interrupt-driven but use lazy cleaning, i.e. only when there are less than `threshold` free descriptors left, we also need cleanup timers to avoid &xdp_buff and &xdp_frame stall for too long, especially with Page Pool (it warns every about inflight pages every 60 second). Let's say we sent 256 frames and don't need to send more, but we clean only when the number of pending items >= 384. In that case, those 256 will stall until 128 more are sent. For this, add simple helpers to run a timer which will clean the queue regardless, after 1 second of the last send. The timer is triggered when finalizing the queue. As long as there is regular active traffic, the timer doesn't fire. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Add convenience helpers to build an &xdp_buff. This means: general initialization before the NAPI loop, adding head, adding frags etc. libeth_xdp_process_buff() is the same what everybody have in their drivers: dma_sync_for_cpu(); if (!frag) { add_head(); prefetch(); } else { add_frag(); } Note that I don't use net_prefetch(), sticking to the original prefetch(). In none of my tests prefetching 128 bytes yielded better perf than 64 bytes. That might differ if the headers are huge enough, but then additional tunneling etc. overhead takes place, you either way won't win a lot. &libeth_xdp_stash is for cases when you exit the polling loop without finishing building the buff. If that happens, you need to store the buffer in the queue structure until the next loop and then restore it. It makes no sense to place a whole full &xdp_buff there. Define a minimal structure, which would store only the fields essential to restore it. I was able to pack it into 16 bytes, which is only 8 bytes bigger than `struct sk_buff *skb` on x64. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Running a prog and handling the verdicts, up to napi_gro_receive() is also pretty generic code not really differing between vendors (except for Tx descriptor filling and Rx descriptor parsing). Define a couple inlines to do that. The inline callbacks a driver needs to pass is mentioned above: Tx descriptor filling for XDP_TX, populating skb with the descriptor data for XDP_PASS, finalizing XDPSQs after the polling loop for XDP_TX (kicking the HW to start sending). The populate callback passes only &libeth_xdp_buff assuming buff::desc pointer is enough, plus you can always get the corresponding Rx queue structure via container_of(buff::rxq). If not, a driver can extend the buff with more fields directly on the stack without touching libeth_xdp definitions. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Defining driver-specific functions to pass to libeth_xdp functions can induce boilerplates and/or look a bit cryptic with all those layers of indirection. On the other hand, this indirection is needed to allow compilers to uninline big functions even when passed to __always_inline helpers (too much inlining also hurts performance in some cases), plus to reuse some XDP helpers in XSk code. Add macros to quickly build them, with the detailed kdoc. They take names of the actual callbacks for filling a Tx descriptor and other purely HW-specific things and wrap them appropriately. LIBETH_XDP_DEFINE_{BEGIN,END}() is needed for GCC 8+ unfortunately to let the drivers control which functions will be static and which global without hitting `-Wold-style-declaration`. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

End the XDP section by adding helpers to setup XDP features, flipping .ndo_xdp_xmit() support at runtime (in case when it's not always on), and calculating the queue clean/refill threshold. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Add Xsk counterparts for XDP_TX buffer sending and completion. The same base structures and functions used from the libeth_xdp core, with adjustments to that XSk Rx always operates on &xdp_buff_xsk for both head and frags. And unlike regular Rx, here unlikely() are used for frags, as the header split gives no benefits for XSk Rx, at least for now. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Reuse core sending functions to send XSk xmit frames. Both metadata and no metadata pools/driver are supported. libeth_xdp also provides generic XSk metadata ops, currently with the checksum offload only and for cases when HW doesn't require supplying L3/L4 checksum offsets. Drivers are free to pass their own ops. &libeth_xdp_tx_bulk is not used here as it would be redundant; pool->tx_descs are accessed directly. Fake "libeth_xsktmo" is needed to hide implementation details from the drivers when they want to use the generic ops: the original struct is defined in the same file where dev->xsk_tx_metadata_ops gets set to avoid duplication of slowpath; at the same time; XSk xmit functions use local "fast" copy to inline XMO callbacks. Tx descriptor filling loop is unrolled by 8. Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> # optimizations Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Add XSk counterparts for preparing XSk &libeth_xdp_buff (adding head and frags), running the program, and handling the verdict, inc. XDP_PASS. Shortcuts in comparison with regular Rx: frags and all verdicts except XDP_REDIRECT are under unlikely() and out of line; no checks for XDP program presence as it's always true for XSk. Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> # optimizations Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

XSkFQ refill is pretty generic across the drivers minus FQ descriptor filling and can easily be unified with one inline callback. XSk wakeup is usually not, but here, instead of commonly used "SW interrupts", I picked firing an IPI. In most tests, it showed better performance; it also provides better control for userspace on which CPU will handle the xmit, as SW interrupts honor IRQ affinity no matter which core produces XSk xmit descs (while XDPSQs are associated 1:1 with cores having the same ID). Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

On 64-bit systems, writing/reading one u64 is faster than two u32s even when they're are adjacent in a struct. The compilers won't guarantee they will combine those; I observed both successful and unsuccessful attempts with both GCC and Clang, and it's not easy to say what it depends on. There's a few places in libeth_xdp winning up to several percent from combined access (both performance and object code size, especially when unrolling). Add __LIBETH_WORD_ACCESS and use it there on LE. Drivers are free to optimize HW-specific callbacks under the same definition. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

No idea what the current barrier position was meant for. At that point, nothing is read from the descriptor, only the pointer to the actual one is fetched. The correct barrier usage here is after the generation check, so that only the first qword is read if the descriptor is not yet ready and we need to stop polling. Debatable on coherent DMA as the Rx descriptor size is <= cacheline size, but anyway, the current barrier position only makes the codegen worse. Fixes: 3a8845a ("idpf: add RX splitq napi poll support") Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Currently, the maximum number of queues available for one vport is 16. This is hardcoded, but then the function calculating the optimal number of queues takes min(16, num_online_cpus()). In order to be able to allocate more queues, which will be then used for XDP, stop hardcoding 16 and rely on what the device gives us. Instead of num_online_cpus(), which is considered suboptimal since at least 2013, use netif_get_num_default_rss_queues() to still have free queues in the pool. nr_cpu_ids number of Tx queues are needed only for lockless XDP sending, the regular stack doesn't benefit from that anyhow. On a 128-thread Xeon, this now gives me 32 regular Tx queues and leaves 224 free for XDP (128 of which will handle XDP_TX, .ndo_xdp_xmit(), and XSk xmit when enabled). Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Add the missing linking of NAPIs to netdev queues when enabling interrupt vectors in order to support NAPI configuration and interfaces requiring get_rx_queue()->napi to be set (like XSk busy polling). This additional RTNL locking in idpf_vport_dealloc() is needed to avoid WARN splats on rmmod, which happens outside of RTNL context, but calls idpf_vport_stop() which assumes RTNL protection. Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> # helper Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

In the queue-based scheduling mode, Tx completion descriptor is 4 bytes comparing to 8 bytes in flow-based. Add definition for it and allocate the corresponding amount of memory for the descriptors during the completion queue creation. This does not include handling 4-byte completions during Tx polling, as for now, the only user of QB will be XDP, which has its own routines. Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

SW marker descriptors on completion queues are used only when a queue is about to be destroyed. It's far from hotpath and handling it in the hotpath NAPI poll makes no sense. Instead, run a simple poller after a virtchnl message for destroying the queue is sent and wait for the replies. If replies for all of the queues are received, this means the synchronization is done correctly and we can go forth with stopping the link. Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Currently, queues are associated 1:1 with interrupt vectors as it's assumed queues are always interrupt-driven. For XDP, we want to use Tx queues without interrupts and only do "lazy" cleaning when the number of free elements is <= threshold (closest pow-2 to 1/4 of the ring). In order to use a queue without an interrupt, idpf still needs to have a vector assigned to it to flush descriptors. This vector can be global and only one for the whole vport to handle all its noirq queues. Always request one excessive vector and configure it in non-interrupt mode right away when creating vport, so that it can be used later by queues when needed (not only XDP ones). Co-developed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Extend basic structures of the driver (e.g. 'idpf_vport', 'idpf_*_queue', 'idpf_vport_user_config_data') by adding members necessary to support XDP. Add extra XDP Tx queues needed to support XDP_TX and XDP_REDIRECT actions without interfering with regular Tx traffic. Also add functions dedicated to support XDP initialization for Rx and Tx queues and call those functions from the existing algorithms of queues configuration. Note that "other count" in Ethtool will now also include XDP Tx queues. Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Co-developed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Implement loading/removing XDP program using .ndo_bpf callback in the split queue mode. Reconfigure and restart the queues if needed (!!old_prog != !!new_prog), otherwise, just update the pointers. Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

In preparation of XDP support, move from having skb as the main frame container during the Rx polling to &xdp_buff. This allows to use generic and libeth helpers for building an XDP buffer and changes the logics: now we try to allocate an skb only when we processed all the descriptors related to the frame. Store &libeth_xdp_stash instead of the skb pointer on the Rx queue. It's only 8 bytes wider, but contains everything we may need. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Use libeth XDP infra to support running XDP program on Rx polling. This includes all of the possible verdicts/actions. XDP Tx queues are cleaned only in "lazy" mode when there are less than 1/4 free descriptors left on the ring. libeth helper macros to define driver-specific XDP functions make sure the compiler could uninline them when needed. Use __LIBETH_WORD_ACCESS to parse descriptors more efficiently when applicable. It really gives some good boosts and code size reduction on x86_64. Co-developed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Use libeth XDP infra to implement .ndo_xdp_xmit() in idpf. The Tx callbacks are reused from XDP_TX code. XDP redirect target feature is set/cleared depending on the XDP prog presence, as for now we still don't allocate XDP Tx queues when there's no program. Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Add &xdp_metadata_ops with a callback to get RSS hash hint from the descriptor. Declare the splitq 32-byte descriptor as 4 u64s to parse them more efficiently when possible. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Implement VC functions dedicated to enabling, disabling and configuring not all but only selected queues. Also, refactor the existing implementation to make the code more modular. Introduce new generic functions for sending VC messages consisting of chunks, in order to isolate the sending algorithm and its implementation for specific VC messages. Finally, rewrite the function for mapping queues to q_vectors using the new modular approach to avoid copying the code that implements the VC message sending algorithm. Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Co-developed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Add functionality to setup an XSk buffer pool, including ability to stop, reconfig and start only selected queues, not the whole device. Pool DMA mapping is managed by libeth_xdp. Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Implement the XSk transmit path using the libeth (libeth_xdp) XSk infra. When the NAPI poll is called, XSk Tx queues are polled first, before regular Tx and Rx. They're generally faster to serve and have higher priority comparing to regular traffic. Co-developed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Implement Rx packet processing specific to AF_XDP ZC using the libeth XSk infra. Initialize queue registers before allocating buffers to avoid redundant ifs when updating the queue tail. Co-developed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Now that AF_XDP functionality is fully implemented, advertise XSk XDP feature and add .ndo_xsk_wakeup() callback to be able to use it with this driver. Co-developed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

During a system reboot, the `pci_device_shutdown()` function is invoked before the network device is unregistered. However, the XDP program is unloaded as part of the netdev unregistration process—by which point most of the device's features have already been stopped. Previously, the `IDPF_REMOVE_IN_PROG` flag was used to detect device removal and handle XDP unloading early. However, this flag does not cover the netdev unregistration path during shutdown. This patch improves detection of shutdown or module removal scenarios by checking both `IDPF_REMOVE_IN_PROG` and `IDPF_VPORT_REG_NETDEV` flags. This ensures the XDP program is unloaded early and cleanly, avoiding unnecessary system reconfiguration. Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>

alobakin · 2025-06-16T13:32:23Z

Picked, nice job, thanks!

Without the change `perf `hangs up on charaster devices. On my system it's enough to run system-wide sampler for a few seconds to get the hangup: $ perf record -a -g --call-graph=dwarf $ perf report # hung `strace` shows that hangup happens on reading on a character device `/dev/dri/renderD128` $ strace -y -f -p 2780484 strace: Process 2780484 attached pread64(101</dev/dri/renderD128>, strace: Process 2780484 detached It's call trace descends into `elfutils`: $ gdb -p 2780484 (gdb) bt #0 0x00007f5e508f04b7 in __libc_pread64 (fd=101, buf=0x7fff9df7edb0, count=0, offset=0) at ../sysdeps/unix/sysv/linux/pread64.c:25 #1 0x00007f5e52b79515 in read_file () from /<<NIX>>/elfutils-0.192/lib/libelf.so.1 #2 0x00007f5e52b25666 in libdw_open_elf () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #3 0x00007f5e52b25907 in __libdw_open_file () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #4 0x00007f5e52b120a9 in dwfl_report_elf@@ELFUTILS_0.156 () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #5 0x000000000068bf20 in __report_module (al=al@entry=0x7fff9df80010, ip=ip@entry=139803237033216, ui=ui@entry=0x5369b5e0) at util/dso.h:537 #6 0x000000000068c3d1 in report_module (ip=139803237033216, ui=0x5369b5e0) at util/unwind-libdw.c:114 #7 frame_callback (state=0x535aef10, arg=0x5369b5e0) at util/unwind-libdw.c:242 #8 0x00007f5e52b261d3 in dwfl_thread_getframes () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #9 0x00007f5e52b25bdb in get_one_thread_cb () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #10 0x00007f5e52b25faa in dwfl_getthreads () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #11 0x00007f5e52b26514 in dwfl_getthread_frames () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #12 0x000000000068c6ce in unwind__get_entries (cb=cb@entry=0x5d4620 <unwind_entry>, arg=arg@entry=0x10cd5fa0, thread=thread@entry=0x1076a290, data=data@entry=0x7fff9df80540, max_stack=max_stack@entry=127, best_effort=best_effort@entry=false) at util/thread.h:152 #13 0x00000000005dae95 in thread__resolve_callchain_unwind (evsel=0x106006d0, thread=0x1076a290, cursor=0x10cd5fa0, sample=0x7fff9df80540, max_stack=127, symbols=true) at util/machine.c:2939 #14 thread__resolve_callchain_unwind (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, sample=0x7fff9df80540, max_stack=127, symbols=true) at util/machine.c:2920 #15 __thread__resolve_callchain (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, evsel@entry=0x7fff9df80440, sample=0x7fff9df80540, parent=parent@entry=0x7fff9df804a0, root_al=root_al@entry=0x7fff9df80440, max_stack=127, symbols=true) at util/machine.c:2970 #16 0x00000000005d0cb2 in thread__resolve_callchain (thread=<optimized out>, cursor=<optimized out>, evsel=0x7fff9df80440, sample=<optimized out>, parent=0x7fff9df804a0, root_al=0x7fff9df80440, max_stack=127) at util/machine.h:198 #17 sample__resolve_callchain (sample=<optimized out>, cursor=<optimized out>, parent=parent@entry=0x7fff9df804a0, evsel=evsel@entry=0x106006d0, al=al@entry=0x7fff9df80440, max_stack=max_stack@entry=127) at util/callchain.c:1127 #18 0x0000000000617e08 in hist_entry_iter__add (iter=iter@entry=0x7fff9df80480, al=al@entry=0x7fff9df80440, max_stack_depth=127, arg=arg@entry=0x7fff9df81ae0) at util/hist.c:1255 #19 0x000000000045d2d0 in process_sample_event (tool=0x7fff9df81ae0, event=<optimized out>, sample=0x7fff9df80540, evsel=0x106006d0, machine=<optimized out>) at builtin-report.c:334 #20 0x00000000005e3bb1 in perf_session__deliver_event (session=0x105ff2c0, event=0x7f5c7d735ca0, tool=0x7fff9df81ae0, file_offset=2914716832, file_path=0x105ffbf0 "perf.data") at util/session.c:1367 #21 0x00000000005e8d93 in do_flush (oe=0x105ffa50, show_progress=false) at util/ordered-events.c:245 #22 __ordered_events__flush (oe=0x105ffa50, how=OE_FLUSH__ROUND, timestamp=<optimized out>) at util/ordered-events.c:324 #23 0x00000000005e1f64 in perf_session__process_user_event (session=0x105ff2c0, event=0x7f5c7d752b18, file_offset=2914835224, file_path=0x105ffbf0 "perf.data") at util/session.c:1419 #24 0x00000000005e47c7 in reader__read_event (rd=rd@entry=0x7fff9df81260, session=session@entry=0x105ff2c0, --Type <RET> for more, q to quit, c to continue without paging-- quit prog=prog@entry=0x7fff9df81220) at util/session.c:2132 #25 0x00000000005e4b37 in reader__process_events (rd=0x7fff9df81260, session=0x105ff2c0, prog=0x7fff9df81220) at util/session.c:2181 #26 __perf_session__process_events (session=0x105ff2c0) at util/session.c:2226 #27 perf_session__process_events (session=session@entry=0x105ff2c0) at util/session.c:2390 #28 0x0000000000460add in __cmd_report (rep=0x7fff9df81ae0) at builtin-report.c:1076 torvalds#29 cmd_report (argc=<optimized out>, argv=<optimized out>) at builtin-report.c:1827 torvalds#30 0x00000000004c5a40 in run_builtin (p=p@entry=0xd8f7f8 <commands+312>, argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0) at perf.c:351 torvalds#31 0x00000000004c5d63 in handle_internal_command (argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0) at perf.c:404 torvalds#32 0x0000000000442de3 in run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:448 torvalds#33 main (argc=<optimized out>, argv=0x7fff9df844b0) at perf.c:556 The hangup happens because nothing in` perf` or `elfutils` checks if a mapped file is easily readable. The change conservatively skips all non-regular files. Signed-off-by: Sergei Trofimovich <slyich@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250505174419.2814857-1-slyich@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>

alobakin and others added 30 commits June 11, 2025 16:34

idpf: add XDP RSS hash hint

fac0b9b

Add &xdp_metadata_ops with a callback to get RSS hash hint from the descriptor. Declare the splitq 32-byte descriptor as 4 u64s to parse them more efficiently when possible. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

michalQb and others added 5 commits June 11, 2025 16:34

alobakin force-pushed the libeth branch from beb4343 to e4744b9 Compare June 16, 2025 13:30

alobakin closed this Jun 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

idpf: properly handle XDP program unloading during system reboot #28

idpf: properly handle XDP program unloading during system reboot #28

Uh oh!

michalQb commented Jun 12, 2025

Uh oh!

alobakin commented Jun 16, 2025

Uh oh!

Uh oh!

idpf: properly handle XDP program unloading during system reboot #28

idpf: properly handle XDP program unloading during system reboot #28

Uh oh!

Conversation

michalQb commented Jun 12, 2025

Uh oh!

alobakin commented Jun 16, 2025

Uh oh!

Uh oh!