Skip to content

Mellanox ConnectX-3 fails to work with XDP Acceleration #29065

@jshr-w

Description

@jshr-w

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

The cluster used for testing had 1 pod and 1 NodePort service with an endpoint for the pod. Sending an HTTP request to the NodeIP:NodePort consistently fails whenever the pod is on a different node than the one with the NodeIP, which is incorrect behavior for this service type.
HTTP requests do work as expected when bpf-lb-acceleration is set to disabled.
They also work correctly on an identical set-up which uses Mellanox ConnectX-5 NICs instead.

Both of these pages suggest that the ConnectX-3 (which has the mlx4 driver) should work with XDP acceleration.

This issue also seems to show up in 1.13.8, but more sporadically - I was able to send many HTTP requests before seeing any failure to send.

The NVidia MLNX_OFED documentation also seems to suggest that this may be a known issue (Ref number: 1550266).

Cilium Version

1.14.3

Kernel Version

5.15.0

Kubernetes Version

1.26.6

Sysdump

No response

Relevant log output

No response

Anything else?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/datapathImpacts bpf/ or low-level forwarding details, including map management and monitor messages.area/documentationImpacts the documentation, including textual changes, sphinx, or other doc generation code.kind/bugThis is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions