Skip to content

Ray ports still random (harden port selection) #261

@terrykong

Description

@terrykong

We are still running into port collisions because not all ray ports are fixed (more apparent on larger node runs). Here are some outputs of the current slurm script:

  #ValueError: Ray component worker_ports is trying to use a port number 53058 that is used by other components.
  #Port information: {'gcs': 'random', 'object_manager': 'random', 'node_manager': 'random', 'gcs_server': 'random', 'client_server': 10001, 'dashboard': 8265, 'dashboard_agent_grpc': 53058, 'dashboard_agent_http': 52365, 'dashboard_grpc': 'random', 'runtime_env_agent': 64678, 'metrics_export': 54151, 'redis_shards': 'random', 'worker_ports': '257 ports from 53001 to 53257'}

# ValueError: Ray component worker_ports is trying to use a port number 53156 that is used by other components.
# Port information: {'gcs': 'random', 'object_manager': 'random', 'node_manager': 'random', 'gcs_server': 'random', 'client_server': 10001, 'dashboard': 8265, 'dashboard_agent_grpc': 64894, 'dashboard_agent_http': 52365, 'dashboard_grpc': 'random', 'runtime_env_agent': 35083, 'metrics_export': 53156, 'redis_shards': 'random', 'worker_ports': '257 ports from 53001 to 53257'}

#ValueError: Ray component worker_ports is trying to use a port number 53225 that is used by other components.
  #Port information: {'gcs': 'random', 'object_manager': 'random', 'node_manager': 'random', 'gcs_server': 'random', 'client_server': 10001, 'dashboard': 8265, 'dashboard_agent_grpc': 60967, 'dashboard_agent_http': 52365, 'dashboard_grpc': 'random', 'runtime_env_agent': 55382, 'metrics_export': 53225, 'redis_shards': 'random', 'worker_ports': '257 ports from 53001 to 53257'}

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions