-
Notifications
You must be signed in to change notification settings - Fork 434
2 new parameters: return a generator / return results asynchronously / add helper function for callbacks #587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #587 +/- ##
==========================================
+ Coverage 95.17% 95.24% +0.07%
==========================================
Files 39 39
Lines 5427 5534 +107
==========================================
+ Hits 5165 5271 +106
- Misses 262 263 +1
Continue to review full report at Codecov.
|
There were a race condition that I just fixed (dumb error: the definition of |
Ok I've reverted some changes (especially using deque instead of lists) and now the tests passes on older python verison, hurrah. |
2d1ac58
to
ca15c50
Compare
All the test passed except the pep8 test, it seems it was because an intermediate faulty commit. I rebased and squashed all commits to fix this test. Hope everything finally becomes green... |
b77db6a
to
e95fea5
Compare
Thanks a lot for your PR! This is quite a significant change and it's not likely that I will have time to review this in the near future. Off the top of my head: the In general it's best not to do too many things in one PR. Small focussed PRs increases the likelihood of getting it merged one day. BTW I cancelled the Travis builds on this PR because there were taking 10+ hours to complete, not sure why. Update: Travis seems to be having issues: https://www.traviscistatus.com/ |
The reason the tests froze is because Travis was down last night, it's up again now. Before this the tests were green, unfortunately the history of the previous tests have been lost in the rebase. I've only added some refactoring + a unittest for the async parameter, those haven't been tested yet but it passes on my local machine, I think you can enable it again. I'd be happy to help the review process :) it could be broken down into two PR: one for the generator output, and another for the async part (those are 100% independant, except for some refactoring). Hints for the review: overall there is no radical change to parallel.py, it's the same strategy and methods than before but the queues I/O are managed slightly differently. The default behavior is still the same than current master (I haven't removed or altered any of the previous unittests). The additions to the other files are minors (two unittests and adding |
I quickly checked and your PR has the same problem as my old attempt in https://github.com/lesteve/joblib/tree/parallel-return-generator: the program hangs if you don't consume completely the generator. Here is a snippet that highlights the problem in your PR: import numpy as np
from joblib import Parallel, delayed
def func(arg):
print('func({})'.format(arg))
result = np.zeros(int(1e7))
result[0] = arg
return result
result = Parallel(
n_jobs=2, as_generator=True,
verbose=2, backend='multiprocessing')(
delayed(func)(i) for i in range(10))
next(result)
next(result)
# Comment this out and the program does not hang
# list(result)
del result Output:
Troubleshooting this problem is a great first challenging step. |
Interestingly this does not hang with When the generator is deleted, it raises an error that is caught and processed by |
It seems that the following is happening in
Source: https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Pipe |
Replacing |
This patch solves the deadlock on my local machine. What seemed to happen is that a message was sent to a killed process, resulting in a deadlock because the processe would never answer. The solution is to close the faulty pipe before terminating the process. I don't exactly understand the role of each of the four pipes in |
Nice, well done! IMO it is better to open a PR with only the as_generator functionality. Personally I think that it is likely to be merged more easily than the one about async_output, I could be wrong about this though. Please add a test based on my snippet for all the backends. Have a look at my old branch to see whether you can reuse some of the tests I wrote at the time: master...lesteve:parallel-return-generator Not sure about the PicklingPool and its four pipes. Maybe @ogrisel can comment on this and on the fix that you found (pretty much do I have some old benchmarks that I could rerun to see how we reduce the memory consumption when return an iterator rather than a list. |
add option to return results asynchronously add helper function for callbacks set foundation for error_callback support in _parallel_backend unittest async buffer and async api
@lesteve this would be so cool to have ... any chance this could get revisited? |
The return_generator part is being developed here #588 |
Ok, I think I did it!
Parallel
API now include two new keywords:as_generator
: ifTrue
the output of__call__
will be a generator. The user is free to iterate on said generator as he wishes (this does not affect the speed of computations).async_output
: ifTrue
the output of__call__
will not be sorted according to the inputiterable
, instead the first elements in the output list (or the output generator ifas_generator = True
) are the results that are returned first by the child processes(or threads depending on the backend).Those two parameters are totally independants. Also, all the nice features of joblib are left untouched (logging,...). Here a little code to test the result (requires PYTHON 3):
/!\ for now
async_output
only works in python 3, withbackend = multiprocessing
. I've added some foundations for support forerror_callback
keyword in_parallel_backend
.I've also added a helper function
get_last_async_result
that is useful if the user wishes to add callbacks inside the inputiterable
.Implements #79, #217, and follows the previous (closed) PR #582, #585, #586 (that partially implemented the present PR).