Skip to content

[Feature]: Support A3 Mega GCP clusters with GPUDirect-TCPXO #2447

@r4victor

Description

@r4victor

Problem

Currently, dstack uses the default slow inter-node networking in GCP, i.e. 10GB/s on a3 instances. The networking speed can be increased significantly for GPU instances. This requires setting up multiple NICs per VM and configuring RDMA:

Solution

No response

Workaround

No response

Would you like to help us implement this feature by sending a PR?

Yes

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions