Improve support for DirectNIC (CX8)
* Add support for XDR speed detection.
* When DirectNIC is enabled, report only the RDMA interfaces.
Extend the P2C (PXN over C2C) support to send/receive operations.
Support compilation with GCC 14 (Issues #1743, #1751).
Fix the unloading of network plugins that also provide tuner capability.
Fix the change of the current device across the calls to ncclCommDestroy()
and ncclCommAbort().
A note for users on MNNVL systems: please ensure an adequate stack size for
NCCL threads. While the default Linux stack size limit of 8192 KB is known
to be sufficient, we've seen crashes if the limit is changed to
"unlimited", as it causes the glibc library to unexpectedly *decrease* the
stack size of NCCL's background threads to just 2048 KB. Use "ulimit -s"
in bash to print the current limit; if needed, reset it to 8192 KB using
"ulimit -s 8192" (one also needs to ensure that the new setting is
propagated to other nodes when launching a multi-node NCCL job).