Make input task generator aware of intermediate results #582

fcharras · 2018-01-13T20:17:30Z

Those few changes to parallel.py are intended to enable reading intermediate outputs during the call to Parallel (Related to #79 and #217).

The callback passed to apply_async now update an attribute last_async_output to the running Parallel instance. It stores the result that has triggered the callback. This value can now be read and processed by the input generator, e.g for logging / printing purpose or to generate new tasks (see example in test_parallel.py).

It seems to work without unexpected behavior for the same reason that the line 217 self.parallel.n_completed_tasks += self.batch_size is safe: multiprocessing (and loky ?) only use one thread to sequentially execute the callbacks. Hence if the input generator reads Parallel.last_async_output the value is guaranteed to be the result that has triggered the callback that had himself triggered one more read into the generator.

(edit: I also tried to make the generator directly read parallel._output but it can happen that the result that has triggered the callback is not added to parallel._output yet)

lesteve · 2018-01-15T10:21:31Z

Can you explain what the use case for this change is? In particular, how is it related to #217?

lesteve · 2018-01-15T10:59:33Z

Side-comment, the AppVeyor failure is not related to this PR (numpy str/repr formatting change in numpy 1.14).

fcharras · 2018-01-15T20:51:58Z

My particular use case: I want to parallelize tasks on several jobs, but some of those tasks can only be fully defined if the results of the previous tasks are known. So it needs a queue: each time a result is returned it can trigger the addition of a new task in the queue.

I want to apply it to bayesian hyper-parameter search, where the state of the bayesian optimizer is updated after each batch of results, before yielding a new batch of points to test. By nature this looks like an asynchronous problem.

I have in mind to achieve a POC that rely on this patch for BayesSearchCV from scikit-optimize. The current implementation uses Parallel to sequentially process the batch of points. There is a call to Parallel at each new batch therefore there can hardly be any benefits from the pre_dispatch and batch_size parameters.

I'm not sure there would be any significant performance improvements (because usually the bottleneck is largely the time required by each task to be processed) but in my opinion it would be more elegant and possibly give a more sensible API.

There could possibly be other use cases for the queue mechanism (RL ?). More generally this enables implementing callbacks on intermediate results (logging, maybe early-stopping...). it might feel hacky for now, but I find it fairly flexible, it does not change joblib much and keeps all the nice features.

About #217: the only relation is that a generator also let the user access intermediate results before all the tasks are completed.. (on a side note, the implementation looks more difficult, and it wouldn't offer the possibility for callbacks).

Some more free thoughts about why I like this:

The current BayesSearchCV inherits from BaseSearchCV but this inheritance is disgracious since it has to redefine the fit method. Looking at the broader picture, I imagined that BaseSearchCV could be implemented with this schema (the result of each task triggers the addition a new task in a queue), so that all the *SearchCV tools would only need to redefine the callback. This could allow GridSearchCV, RandomSearchCV and BayesSearchCV to inherit consistently from BaseSearchCV. (the callback would be trivial for Grid and Random: return the next point from the pre-defined list).

codecov · 2018-01-15T21:21:40Z

Codecov Report

Merging #582 into master will decrease coverage by 16.04%.
The diff coverage is 28.12%.

@@             Coverage Diff             @@
##           master     #582       +/-   ##
===========================================
- Coverage   94.23%   78.18%   -16.05%     
===========================================
  Files          39       38        -1     
  Lines        5011     5001       -10     
===========================================
- Hits         4722     3910      -812     
- Misses        289     1091      +802

Impacted Files	Coverage Δ
joblib/parallel.py	`82.74% <100%> (-15.47%)`	⬇️
joblib/test/test_parallel.py	`56.94% <17.85%> (-39.01%)`	⬇️
joblib/test/test_memmapping.py	`27.81% <0%> (-72.19%)`	⬇️
joblib/executor.py	`37.93% <0%> (-62.07%)`	⬇️
joblib/_memory_helpers.py	`15.38% <0%> (-58.47%)`	⬇️
joblib/backports.py	`39.58% <0%> (-54.17%)`	⬇️
joblib/pool.py	`46.55% <0%> (-44.83%)`	⬇️
joblib/_parallel_backends.py	`53.73% <0%> (-39.72%)`	⬇️
joblib/_memmapping_reducer.py	`56.55% <0%> (-38.63%)`	⬇️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e229609...4ee083e. Read the comment docs.

fcharras · 2018-01-15T21:27:13Z

... the codecov report seems broken ? (the last commit added the test/example for all parallel backends)

franck added 2 commits January 13, 2018 20:31

last computed result is an attribute at callback time

d59d36f

test-example that generator can be aware of last output result

2f8ed19

unittest pass with all parallel backends

4ee083e

This was referenced Jan 15, 2018

naïve way to return a generator instead of a list (not asynchronously) #585

Closed

Generator output and async api #586

Closed

fcharras closed this Jan 16, 2018

fcharras mentioned this pull request Jan 16, 2018

2 new parameters: return a generator / return results asynchronously / add helper function for callbacks #587

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make input task generator aware of intermediate results #582

Make input task generator aware of intermediate results #582

Uh oh!

fcharras commented Jan 13, 2018 •

edited

Loading

Uh oh!

lesteve commented Jan 15, 2018

Uh oh!

lesteve commented Jan 15, 2018

Uh oh!

fcharras commented Jan 15, 2018 •

edited

Loading

Uh oh!

codecov bot commented Jan 15, 2018 •

edited

Loading

Uh oh!

fcharras commented Jan 15, 2018 •

edited

Loading

Uh oh!

Uh oh!

Make input task generator aware of intermediate results #582

Make input task generator aware of intermediate results #582

Uh oh!

Conversation

fcharras commented Jan 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lesteve commented Jan 15, 2018

Uh oh!

lesteve commented Jan 15, 2018

Uh oh!

fcharras commented Jan 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jan 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

fcharras commented Jan 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

fcharras commented Jan 13, 2018 •

edited

Loading

fcharras commented Jan 15, 2018 •

edited

Loading

codecov bot commented Jan 15, 2018 •

edited

Loading

fcharras commented Jan 15, 2018 •

edited

Loading