2 new parameters: return a generator / return results asynchronously / add helper function for callbacks #587

fcharras · 2018-01-16T08:41:39Z

Ok, I think I did it! Parallel API now include two new keywords:

as_generator: if True the output of __call__ will be a generator. The user is free to iterate on said generator as he wishes (this does not affect the speed of computations).
async_output : if True the output of __call__ will not be sorted according to the input iterable, instead the first elements in the output list (or the output generator if as_generator = True ) are the results that are returned first by the child processes(or threads depending on the backend).

Those two parameters are totally independants. Also, all the nice features of joblib are left untouched (logging,...). Here a little code to test the result (requires PYTHON 3):

import time
import random
from joblib import Parallel, delayed


def some_function(i):
    time.sleep(0.001 * random.randint(1, 100))
    return (i + 1)


pool = Parallel(n_jobs=2, async_output=True, backend='multiprocessing')
pool2 = Parallel(n_jobs=2, async_output=False, backend='multiprocessing')

res = pool(delayed(some_function)(i) for i in range(100))
res2 = pool2(delayed(some_function)(i) for i in range(100))
print(res) # order is different at each call
print(res2) # order is as expected
assert(sorted(res) == res2)

/!\ for now async_output only works in python 3, with backend = multiprocessing. I've added some foundations for support for error_callback keyword in _parallel_backend.

I've also added a helper function get_last_async_result that is useful if the user wishes to add callbacks inside the input iterable.

Implements #79, #217, and follows the previous (closed) PR #582, #585, #586 (that partially implemented the present PR).

codecov · 2018-01-16T09:34:53Z

Codecov Report

Merging #587 into master will increase coverage by 0.07%.
The diff coverage is 96.59%.

@@            Coverage Diff             @@
##           master     #587      +/-   ##
==========================================
+ Coverage   95.17%   95.24%   +0.07%     
==========================================
  Files          39       39              
  Lines        5427     5534     +107     
==========================================
+ Hits         5165     5271     +106     
- Misses        262      263       +1

Impacted Files	Coverage Δ
joblib/pool.py	`93.27% <100%> (+1.89%)`	⬆️
joblib/test/test_parallel.py	`96.66% <100%> (+0.52%)`	⬆️
joblib/test/common.py	`86.79% <100%> (+0.25%)`	⬆️
joblib/_parallel_backends.py	`95.45% <89.47%> (-0.24%)`	⬇️
joblib/parallel.py	`97.97% <95.16%> (-0.75%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e101321...4c6b5cf. Read the comment docs.

fcharras · 2018-01-16T09:37:32Z

There were a race condition that I just fixed (dumb error: the definition of job and job_idx had been switched in _dispatch).

fcharras · 2018-01-16T10:03:52Z

Ok I've reverted some changes (especially using deque instead of lists) and now the tests passes on older python verison, hurrah.

fcharras · 2018-01-16T10:40:16Z

All the test passed except the pep8 test, it seems it was because an intermediate faulty commit. I rebased and squashed all commits to fix this test. Hope everything finally becomes green...
edit: all green!

lesteve · 2018-01-17T10:09:29Z

Thanks a lot for your PR! This is quite a significant change and it's not likely that I will have time to review this in the near future.

Off the top of my head: the as_generator thing is what I was trying to aim for and could not get it to work. This maybe the first aspect I look at when I get the time.

In general it's best not to do too many things in one PR. Small focussed PRs increases the likelihood of getting it merged one day.

BTW I cancelled the Travis builds on this PR because there were taking 10+ hours to complete, not sure why. Update: Travis seems to be having issues: https://www.traviscistatus.com/

fcharras · 2018-01-17T10:25:10Z

The reason the tests froze is because Travis was down last night, it's up again now. Before this the tests were green, unfortunately the history of the previous tests have been lost in the rebase. I've only added some refactoring + a unittest for the async parameter, those haven't been tested yet but it passes on my local machine, I think you can enable it again.

I'd be happy to help the review process :) it could be broken down into two PR: one for the generator output, and another for the async part (those are 100% independant, except for some refactoring).

Hints for the review: overall there is no radical change to parallel.py, it's the same strategy and methods than before but the queues I/O are managed slightly differently. The default behavior is still the same than current master (I haven't removed or altered any of the previous unittests). The additions to the other files are minors (two unittests and adding error_callback keyword to async_apply in _parallel_backend).

lesteve · 2018-01-17T13:26:16Z

I quickly checked and your PR has the same problem as my old attempt in https://github.com/lesteve/joblib/tree/parallel-return-generator: the program hangs if you don't consume completely the generator. Here is a snippet that highlights the problem in your PR:

import numpy as np

from joblib import Parallel, delayed


def func(arg):
    print('func({})'.format(arg))
    result = np.zeros(int(1e7))
    result[0] = arg
    return result


result = Parallel(
    n_jobs=2, as_generator=True,
    verbose=2, backend='multiprocessing')(
        delayed(func)(i) for i in range(10))

next(result)
next(result)

# Comment this out and the program does not hang
# list(result)

del result

Output:

func(0)
func(1)
func(2)
func(3)

Troubleshooting this problem is a great first challenging step.

fcharras · 2018-01-17T14:49:29Z

Interestingly this does not hang with backend = 'loky' and backend = 'threading'. The generator seems to work fine with those backends so there's at least that.

When the generator is deleted, it raises an error that is caught and processed by _handle_error as expected. What seems to happens with backend = multiprocessing is that backend.abort_everything(ensure_ready=ensure_ready) , which triggers multiprocessing.Pool.terminate(), ends in deadlocks somewhere in the CustomizablePicklingQueue in pool.py. There seems to be an unexpected interaction between multiprocessing.Pool.terminate()and CustomizablePicklingQueue...

fcharras · 2018-01-17T15:15:33Z

It seems that the following is happening in CustomizablePicklingQueue in pool.py ? Indeed the following line hangs: self._writer.send_bytes(buffer.getvalue()).

Warning

As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread), then that process will not terminate until all buffered items have been flushed to the pipe.

This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.

Note that a queue created using a manager does not have this issue. See Programming guidelines.

Source: https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Pipe

fcharras · 2018-01-17T15:35:50Z

Replacing super(MemmappingPool, self).terminate() with super(MemmappingPool, self).join() removes the lock! the issue is that terminate is what we really want, but we're getting closer...

fcharras · 2018-01-17T16:19:12Z

This patch solves the deadlock on my local machine. What seemed to happen is that a message was sent to a killed process, resulting in a deadlock because the processe would never answer. The solution is to close the faulty pipe before terminating the process. I don't exactly understand the role of each of the four pipes in PicklingPool though. Closing the other pipes does nothing or creates BrokenPipe errors.

lesteve · 2018-01-17T16:56:45Z

Nice, well done! IMO it is better to open a PR with only the as_generator functionality. Personally I think that it is likely to be merged more easily than the one about async_output, I could be wrong about this though. Please add a test based on my snippet for all the backends. Have a look at my old branch to see whether you can reuse some of the tests I wrote at the time: master...lesteve:parallel-return-generator

Not sure about the PicklingPool and its four pipes. Maybe @ogrisel can comment on this and on the fix that you found (pretty much do self._inqueue._reader.close() before .terminate()).

I have some old benchmarks that I could rerun to see how we reduce the memory consumption when return an iterator rather than a list.

add option to return results asynchronously add helper function for callbacks set foundation for error_callback support in _parallel_backend unittest async buffer and async api

letalvoj · 2021-05-02T15:51:48Z

@lesteve this would be so cool to have ... any chance this could get revisited?

fcharras · 2021-05-17T23:10:18Z

The return_generator part is being developed here #588

fcharras force-pushed the async_option branch from 2d1ac58 to ca15c50 Compare January 16, 2018 10:37

fcharras force-pushed the async_option branch 4 times, most recently from b77db6a to e95fea5 Compare January 17, 2018 09:58

fcharras mentioned this pull request Jan 18, 2018

Add return_generator functionality #588

Closed

Parallel can now return a generator

5f3deb3

iaroslav-ai mentioned this pull request Apr 22, 2018

Dask scikit-optimize/scikit-optimize#661

Open

franck added 5 commits April 25, 2018 22:22

add an exemple of relevant generator use in the gallery

a37e9e7

decrease overall memory usage in generator example for circleci

2177754

get more striking numbers in the example while still passing circleci

f35ef95

improve comments docstrings and formatting

c08a144

add option to return generator

793ff89

add option to return results asynchronously add helper function for callbacks set foundation for error_callback support in _parallel_backend unittest async buffer and async api

fcharras force-pushed the async_option branch from bd980e1 to 793ff89 Compare April 27, 2018 22:56

franck added 2 commits April 28, 2018 10:03

_pool -> _get_pool()

4e08f4d

pep8

4c6b5cf

fcharras closed this Aug 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

2 new parameters: return a generator / return results asynchronously / add helper function for callbacks #587

2 new parameters: return a generator / return results asynchronously / add helper function for callbacks #587

Uh oh!

fcharras commented Jan 16, 2018 •

edited

Loading

Uh oh!

codecov bot commented Jan 16, 2018 •

edited

Loading

Uh oh!

fcharras commented Jan 16, 2018 •

edited

Loading

Uh oh!

fcharras commented Jan 16, 2018

Uh oh!

fcharras commented Jan 16, 2018 •

edited

Loading

Uh oh!

lesteve commented Jan 17, 2018 •

edited

Loading

Uh oh!

fcharras commented Jan 17, 2018 •

edited

Loading

Uh oh!

lesteve commented Jan 17, 2018

Uh oh!

fcharras commented Jan 17, 2018

Uh oh!

fcharras commented Jan 17, 2018 •

edited

Loading

Uh oh!

fcharras commented Jan 17, 2018

Uh oh!

fcharras commented Jan 17, 2018 •

edited

Loading

Uh oh!

lesteve commented Jan 17, 2018

Uh oh!

letalvoj commented May 2, 2021

Uh oh!

fcharras commented May 17, 2021

Uh oh!

Uh oh!

2 new parameters: return a generator / return results asynchronously / add helper function for callbacks #587

2 new parameters: return a generator / return results asynchronously / add helper function for callbacks #587

Uh oh!

Conversation

fcharras commented Jan 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jan 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

fcharras commented Jan 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fcharras commented Jan 16, 2018

Uh oh!

fcharras commented Jan 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lesteve commented Jan 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fcharras commented Jan 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lesteve commented Jan 17, 2018

Uh oh!

fcharras commented Jan 17, 2018

Uh oh!

fcharras commented Jan 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fcharras commented Jan 17, 2018

Uh oh!

fcharras commented Jan 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lesteve commented Jan 17, 2018

Uh oh!

letalvoj commented May 2, 2021

Uh oh!

fcharras commented May 17, 2021

Uh oh!

Uh oh!

fcharras commented Jan 16, 2018 •

edited

Loading

codecov bot commented Jan 16, 2018 •

edited

Loading

fcharras commented Jan 16, 2018 •

edited

Loading

fcharras commented Jan 16, 2018 •

edited

Loading

lesteve commented Jan 17, 2018 •

edited

Loading

fcharras commented Jan 17, 2018 •

edited

Loading

fcharras commented Jan 17, 2018 •

edited

Loading

fcharras commented Jan 17, 2018 •

edited

Loading