-
Notifications
You must be signed in to change notification settings - Fork 122
Closed
Description
We are still running into port collisions because not all ray ports are fixed (more apparent on larger node runs). Here are some outputs of the current slurm script:
#ValueError: Ray component worker_ports is trying to use a port number 53058 that is used by other components.
#Port information: {'gcs': 'random', 'object_manager': 'random', 'node_manager': 'random', 'gcs_server': 'random', 'client_server': 10001, 'dashboard': 8265, 'dashboard_agent_grpc': 53058, 'dashboard_agent_http': 52365, 'dashboard_grpc': 'random', 'runtime_env_agent': 64678, 'metrics_export': 54151, 'redis_shards': 'random', 'worker_ports': '257 ports from 53001 to 53257'}
# ValueError: Ray component worker_ports is trying to use a port number 53156 that is used by other components.
# Port information: {'gcs': 'random', 'object_manager': 'random', 'node_manager': 'random', 'gcs_server': 'random', 'client_server': 10001, 'dashboard': 8265, 'dashboard_agent_grpc': 64894, 'dashboard_agent_http': 52365, 'dashboard_grpc': 'random', 'runtime_env_agent': 35083, 'metrics_export': 53156, 'redis_shards': 'random', 'worker_ports': '257 ports from 53001 to 53257'}
#ValueError: Ray component worker_ports is trying to use a port number 53225 that is used by other components.
#Port information: {'gcs': 'random', 'object_manager': 'random', 'node_manager': 'random', 'gcs_server': 'random', 'client_server': 10001, 'dashboard': 8265, 'dashboard_agent_grpc': 60967, 'dashboard_agent_http': 52365, 'dashboard_grpc': 'random', 'runtime_env_agent': 55382, 'metrics_export': 53225, 'redis_shards': 'random', 'worker_ports': '257 ports from 53001 to 53257'}
Metadata
Metadata
Assignees
Labels
No labels