Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

Pool tasks for sp_tasks::spawn and add execution limits #11227

@koute

Description

@koute

Currently the way we implement the sp_tasks::spawn runtime API (which is roughly equivalent to a std::thread::spawn, just called from within the runtime and without any shared memory between the caller and the callee) has a few problems from what I can see:

  • it is unbound**: you could exhaust system resources by spawning enough tasks
  • it uses spawn instead of spawn_blocking yet it is blocking (it uses a blocking channel, and calling into the runtime is also effectively a blocking operation)
  • you can call it recursively without any limit

It can deadlock the whole async loop when called enough times in parallel, or called enough times recursively.

Its docs also say that it's dangerous when used incorrectly.

(** - although due to it using spawn you'll probably exhaust all of the tokio worker threads and deadlock the whole process first)

What we should do to fix this:

  • Soft-limit the number of tasks which can be concurrently executed; add the rest into a queue. There's no point in supporting an unbound number of tasks executing at the same time since calling into the runtime is an inherently blocking operation, and you don't have an infinite number of CPU cores to run them all anyway. We should pick a preset number of supported parallel tasks and execute only at most that many simultaneously. (This also reduces the risk of triggering network partitions if this is used in a place which can affect consensus.)
  • Hard-limit the number of levels the tasks can be spawned recursively, or (preferably) even just disallow it completely; otherwise you'll still risk a deadlock

It would also be nice to add a hard-limit for the number of tasks which can be queued at the same time, but unfortunately that could be a source of non-determinism depending on how fast the already queued tasks are executed; we could however add a potential hard-limit to the total number of tasks.

Fortunately from what I can see no one currently really uses this API? (At least it isn't used in polkadot, if my grepping is correct; not sure about other chains though.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    I8-footprintAn enhancement to provide a smaller (system load, memory, network or disk) footprint.J0-enhancementAn additional feature request.

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions