-
Notifications
You must be signed in to change notification settings - Fork 434
ENH add initializer limiting n_threads for C-libs #701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tomMoral
commented
Jun 20, 2018
- This functionnality supports OpenMP, OpenBLAS, MKL and Accelerated
- Set n_threads=1 by default in all workers to avoid oversubscrption
Codecov Report
@@ Coverage Diff @@
## master #701 +/- ##
==========================================
- Coverage 95.14% 95.02% -0.13%
==========================================
Files 40 40
Lines 5725 5746 +21
==========================================
+ Hits 5447 5460 +13
- Misses 278 286 +8
Continue to review full report at Codecov.
|
joblib/_parallel_backends.py
Outdated
@@ -330,7 +348,8 @@ def _get_pool(self): | |||
call to apply_async. | |||
""" | |||
if self._pool is None: | |||
self._pool = ThreadPool(self._n_jobs) | |||
self._pool = ThreadPool( | |||
self._n_jobs, initializer=self.limit_clib_threads) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think threads have isolated environment variables. Dynamic / ctypes based calls would be required to limit clib threads while joblib is doing thread-based parallelism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just not register any initializer at the ThreadPool
level for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirm that there is a single environment namespace per-process: all threads of the same process share the same environment variables:
>>> from time import sleep
>>> import os
>>> from threading import get_ident
>>> from multiprocessing.pool import ThreadPool
>>> def env_tweaks(*args):
... tid = str(get_ident())
... os.environ['THREAD_ID'] = tid
... sleep(0.01)
... return (tid, os.environ['THREAD_ID'])
...
...
>>> p = ThreadPool(4)
>>> results = p.map(env_tweaks, range(10))
>>> results
[('140286315460352', '140286290282240'), ('140286307067648', '140286315460352'), ('140286298674944', '140286290282240'), ('140286290282240', '140286307067648'), ('140286315460352', '140286298674944'), ('140286307067648', '140286315460352'), ('140286290282240', '140286307067648'), ('140286298674944', '140286307067648'), ('140286315460352', '140286307067648'), ('140286307067648', '140286307067648')]
>>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed.. thanks for pointing that out!
Please add an entry to the changelog. Please also add a new section at the end of the |
Yes I am planning on adding in this PR:
|
- this functionnality supports OpenMP, OpenBLAS, MKL and Accelerated - Set n_threads=1 by default in all workers to avoid oversubscrption
e24abb5
to
e7c933b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM besides the following the comments:
CHANGES.rst
Outdated
Thomas Moreau | ||
|
||
Limit the number of threads used in worker processes (for ``loky`` and | ||
``multiprocessing``) by C-libraries that relies on threadpools. This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not mention multiprocessing
as it's not reliable when using the default fork
context under posix.
CHANGES.rst
Outdated
Implement the ``loky`` backend with @ogrisel. This backend relies on | ||
a robust implementation of ``concurrent.futures.ProcessPoolExecutor`` | ||
with spawned processes that can be reused accross the ``Parallel`` | ||
calls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this to the top and explain that it fixes bad interactions (crash) with the internal thread-pool of third-party libraries with a link to.
doc/parallel.rst
Outdated
computational power available for each process. Moreover, when many processes | ||
are running, the time taken by the OS scheduler to switch between them can | ||
further hinder the performance of the computation. It is generally better to | ||
avoid using more ``Process/Thread`` than the number of CPU on a machine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid using significantly more processes or threads than the number of CPUs on a machine.
doc/parallel.rst
Outdated
further hinder the performance of the computation. It is generally better to | ||
avoid using more ``Process/Thread`` than the number of CPU on a machine. | ||
|
||
Some third-partiy libraries -- *e.g.* ``numpy`` -- manage internally a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. the BLAS runtime used by numpy
doc/parallel.rst
Outdated
avoid using more ``Process/Thread`` than the number of CPU on a machine. | ||
|
||
Some third-partiy libraries -- *e.g.* ``numpy`` -- manage internally a | ||
thread-pool to perform their computations. The default behavior is generaly to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally
* tag '0.12': (116 commits) Release 0.12 typo typo typo ENH add initializer limiting n_threads for C-libs (joblib#701) DOC better parallel docstring (joblib#704) [MRG] Nested parallel call thread bomb mitigation (joblib#700) MTN vendor loky2.1.3 (joblib#699) Make it possible to configure the reusable executor workers timeout (joblib#698) MAINT increase timeouts to make test more robust on travis DOC: use the .joblib extension instead of .pkl (joblib#697) [MRG] Fix exception handling in nested parallel calls (joblib#696) Fix skip test lz4 not installed (joblib#695) [MRG] numpy_pickle: several enhancements (joblib#626) Introduce Parallel.__call__ backend callbacks (joblib#689) Add distributed on readthedocs (joblib#686) Support registration of external backends (joblib#655) [MRG] Add a dask.distributed example (joblib#613) ENH use cloudpickle to pickle interactively defined callable (joblib#677) CI freeze the version of sklearn0.19.1 and scipy1.0.1 (joblib#685) ...
* releases: (121 commits) Release 0.12.1 fix kwonlydefaults key error in filter_args (joblib#715) MNT fix some "undefined name" flake8 warnings (joblib#713). from importlib import reload for Python 3 (joblib#675) MTN vendor loky2.1.4 (joblib#708) Release 0.12 typo typo typo ENH add initializer limiting n_threads for C-libs (joblib#701) DOC better parallel docstring (joblib#704) [MRG] Nested parallel call thread bomb mitigation (joblib#700) MTN vendor loky2.1.3 (joblib#699) Make it possible to configure the reusable executor workers timeout (joblib#698) MAINT increase timeouts to make test more robust on travis DOC: use the .joblib extension instead of .pkl (joblib#697) [MRG] Fix exception handling in nested parallel calls (joblib#696) Fix skip test lz4 not installed (joblib#695) [MRG] numpy_pickle: several enhancements (joblib#626) Introduce Parallel.__call__ backend callbacks (joblib#689) ...