Skip to content

Conversation

mrocklin
Copy link
Contributor

Fixes #653

Currently when using external backends with joblib you need to know to
import distributed.joblib, which is not particularly intuitive.

from joblib import parallel_backend
with parallel_backend("dask"):
    pass

KeyError: 'dask'

Now implementers of downstream parallel backends can register small
functions within the Joblib codebase that register their backends
appropriately when triggered by users.

FWIW I suspect that not everyone will agree with this change. I will not be surprised if it does not get merged. It was simple enough to implement though that I thought I'd throw it up here just to make conversation about it easier.

@codecov
Copy link

codecov bot commented Mar 25, 2018

Codecov Report

Merging #655 into master will increase coverage by 0.03%.
The diff coverage is 70.58%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #655      +/-   ##
==========================================
+ Coverage   95.09%   95.12%   +0.03%     
==========================================
  Files          39       39              
  Lines        5462     5478      +16     
==========================================
+ Hits         5194     5211      +17     
+ Misses        268      267       -1
Impacted Files Coverage Δ
joblib/test/test_parallel.py 96.43% <100%> (+0.55%) ⬆️
joblib/parallel.py 97.38% <50%> (-1.42%) ⬇️
joblib/_parallel_backends.py 95.25% <0%> (+1.29%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2a31976...f42b7b5. Read the comment docs.

@mrocklin
Copy link
Contributor Author

Any feedback on this idea?

@ogrisel
Copy link
Contributor

ogrisel commented Mar 30, 2018

I was thinking that maybe we could even move the code of the dask backend into the joblib code base because the backend semi-private API is more subject to change than the dask distributed public API.

I would like to find some time in the coming month to automatically scatter heavy data to avoid redundant data transfers without having the use to explicitly use the scatter feature of the backend. This refactoring would be more natural to do as part of the joblib code base I think.

@mrocklin
Copy link
Contributor Author

That would work for me

@GaelVaroquaux
Copy link
Member

This code is fairly lightweight, and it is gives benefit to the user, so I have no major issues with the idea.

Two points:

mrocklin added 2 commits May 28, 2018 17:50
Fixes joblib#653

Currently when using external backends with joblib you need to know to
`import distributed.joblib`, which is not particularly intuitive.

```python
from joblib import parallel_backend
with parallel_backend("dask"):
    pass

KeyError: 'dask'
```

Now implementers of downstream parallel backends can register small
functions within the Joblib codebase that register their backends
appropriately when triggered by users.
@mrocklin mrocklin force-pushed the external-backends branch from 6e4a451 to cc562fb Compare May 28, 2018 22:01
@mrocklin
Copy link
Contributor Author

Those both seem reasonable to fix.

Do we want to fail gracefully when distributed is no available? Maybe that's for future work, to be able to specify fallbacks while choosing the backend.

I agree that this would be useful.

Short-term I've added a more informative error.

We should somewhat briefly document this, so that other backend providers can use it. The following page would be a good one: https://joblib.readthedocs.io/en/latest/parallel.html#custom-backend-api-experimental

Done. Feedback appreciated to check for tone and and detail.

I've also changed the dask example.

@GaelVaroquaux
Copy link
Member

LGTM. Merging!

@GaelVaroquaux
Copy link
Member

Actually, let me wait for travis

@mrocklin mrocklin force-pushed the external-backends branch 2 times, most recently from 88ff6af to 65bd019 Compare May 28, 2018 23:20
@mrocklin mrocklin force-pushed the external-backends branch from 65bd019 to f42b7b5 Compare May 29, 2018 00:11
@mrocklin
Copy link
Contributor Author

Tests pass except for codecov. I'm not sure how to interpret the results there.

@GaelVaroquaux
Copy link
Member

OK, merging. Thank you.

@GaelVaroquaux GaelVaroquaux merged commit 47650f6 into joblib:master May 29, 2018
@mrocklin mrocklin deleted the external-backends branch May 29, 2018 15:29
yarikoptic added a commit to yarikoptic/joblib that referenced this pull request Jul 28, 2018
* tag '0.12': (116 commits)
  Release 0.12
  typo
  typo
  typo
  ENH add initializer limiting n_threads for C-libs (joblib#701)
  DOC better parallel docstring (joblib#704)
  [MRG] Nested parallel call thread bomb mitigation (joblib#700)
  MTN vendor loky2.1.3 (joblib#699)
  Make it possible to configure the reusable executor workers timeout (joblib#698)
  MAINT increase timeouts to make test more robust on travis
  DOC: use the .joblib extension instead of .pkl (joblib#697)
  [MRG] Fix exception handling in nested parallel calls (joblib#696)
  Fix skip test lz4 not installed (joblib#695)
  [MRG] numpy_pickle:  several enhancements (joblib#626)
  Introduce Parallel.__call__ backend callbacks (joblib#689)
  Add distributed on readthedocs (joblib#686)
  Support registration of external backends (joblib#655)
  [MRG] Add a dask.distributed example (joblib#613)
  ENH use cloudpickle to pickle interactively defined callable (joblib#677)
  CI freeze the version of sklearn0.19.1 and scipy1.0.1 (joblib#685)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants