Skip to content

return_generator: add async flag to yield results in order completed #1449

@scottgigante-immunai

Description

@scottgigante-immunai

When return_generator=True, results are only yielded in the order dispatched. In some cases (e.g. when some jobs run a very long time and others are quite fast, and we don't care about order) it would be nice to be able to yield the results instead in the order they are completed.

Example:

import joblib
import time

def my_quick_func(i):
    return f"quick: {i}"

def my_slow_func(i):
    time.sleep(10)
    return f"slow: {i}"

def producer():
    for i in range(10):
        if i % 5 == 0:
            yield joblib.delayed(my_slow_func)(i)
        else:
            yield joblib.delayed(my_quick_func)(i)

for result in joblib.Parallel(-1, return_generator=True)(delayed_func for delayed_func in producer()):
    print(result)

Current output:

slow: 0
quick: 1
quick: 2
quick: 3
quick: 4
slow: 5
quick: 6
quick: 7
quick: 8
quick: 9

Desired output (with a flag e.g. async=True):

quick: 1
quick: 2
quick: 3
quick: 4
quick: 6
quick: 7
quick: 8
quick: 9
slow: 0
slow: 5

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions