Skip to content

[core/dashboard] Separate actor scheduling vs slow initialization ('pending_creation is not clear') #55212

@richardliaw

Description

@richardliaw

Description

When a user puts a time-consuming task (like downloading a model) inside an actor’s __init__, the actor shows up as PENDING.
The user naturally suspects the scheduler, even though the delay is inside initialization.

Here is a reproduction on a 8-cpu node.

Scenario 1: Unschedulable actor shows "PENDING CREATION".

import ray

@ray.remote(num_cpus=8)
class Test:
    def __init__(self):
        pass
    
    def ping(self):
        return 1


actor = Test.remote()
actor2 = Test.remote()
ray.get([a.ping.remote() for a in [actor, actor2]])
Image

Scenario 2: Slower initialization actor also shows "PENDING CREATION"

import ray

@ray.remote(num_cpus=1)
class Test:
    def __init__(self):
        import time
        time.sleep(1000)
    
    def ping(self):
        return 1


actor = Test.remote()
actor2 = Test.remote()
ray.get([a.ping.remote() for a in [actor, actor2]])
Image

Preferred Behavior

That Ray shows a different status for Scenario 1 and Scenario 2. In this case, ideally Scenario 2 shows that the actors have been scheduled (and there is a specific node it is already on), but are currently awaiting initialization.

Use case

This is particularly useful for post-training workloads and workloads that deal with large initialization times.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Important issue, but not time-criticalcoreIssues that should be addressed in Ray CoreenhancementRequest for new feature and/or capabilityobservabilityIssues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions