Skip to content

Conversation

tomMoral
Copy link
Contributor

  • This functionnality supports OpenMP, OpenBLAS, MKL and Accelerated
  • Set n_threads=1 by default in all workers to avoid oversubscrption

@codecov
Copy link

codecov bot commented Jun 20, 2018

Codecov Report

Merging #701 into master will decrease coverage by 0.12%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #701      +/-   ##
==========================================
- Coverage   95.14%   95.02%   -0.13%     
==========================================
  Files          40       40              
  Lines        5725     5746      +21     
==========================================
+ Hits         5447     5460      +13     
- Misses        278      286       +8
Impacted Files Coverage Δ
joblib/test/test_parallel.py 96.62% <100%> (+0.07%) ⬆️
joblib/executor.py 100% <100%> (ø) ⬆️
joblib/_parallel_backends.py 94.8% <100%> (-2.32%) ⬇️
joblib/backports.py 93.75% <0%> (-2.09%) ⬇️
joblib/_store_backends.py 90.47% <0%> (-0.53%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c3eea7c...5321aad. Read the comment docs.

@@ -330,7 +348,8 @@ def _get_pool(self):
call to apply_async.
"""
if self._pool is None:
self._pool = ThreadPool(self._n_jobs)
self._pool = ThreadPool(
self._n_jobs, initializer=self.limit_clib_threads)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think threads have isolated environment variables. Dynamic / ctypes based calls would be required to limit clib threads while joblib is doing thread-based parallelism.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just not register any initializer at the ThreadPool level for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm that there is a single environment namespace per-process: all threads of the same process share the same environment variables:

>>> from time import sleep
>>> import os
>>> from threading import get_ident
>>> from multiprocessing.pool import ThreadPool
>>> def env_tweaks(*args):
...     tid = str(get_ident())
...     os.environ['THREAD_ID'] = tid
...     sleep(0.01)
...     return (tid, os.environ['THREAD_ID'])
... 
... 
>>> p = ThreadPool(4)
>>> results = p.map(env_tweaks, range(10))
>>> results
[('140286315460352', '140286290282240'), ('140286307067648', '140286315460352'), ('140286298674944', '140286290282240'), ('140286290282240', '140286307067648'), ('140286315460352', '140286298674944'), ('140286307067648', '140286315460352'), ('140286290282240', '140286307067648'), ('140286298674944', '140286307067648'), ('140286315460352', '140286307067648'), ('140286307067648', '140286307067648')]
>>> 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.. thanks for pointing that out!

@ogrisel
Copy link
Contributor

ogrisel commented Jun 20, 2018

Please add an entry to the changelog.

Please also add a new section at the end of the parallel.rst documentation to explain the concept of over-subscription of CPU resources by an excess number of concurrent processes and threads in general and the expected behavior of this simplistic mitigation strategy when joblib calls into third-party libraries that manage their own internal thread pools.

@tomMoral
Copy link
Contributor Author

tomMoral commented Jun 20, 2018

Yes I am planning on adding in this PR:

  • an entry in the changelog
  • an entry in the doc
  • an entry in the travis matrix to test this with cython and openmp.

tomMoral added 2 commits June 21, 2018 14:56
- this functionnality supports OpenMP, OpenBLAS, MKL and Accelerated
- Set n_threads=1 by default in all workers to avoid oversubscrption
@tomMoral tomMoral force-pushed the PR_limit_threads_omp branch from e24abb5 to e7c933b Compare June 21, 2018 12:58
Copy link
Contributor

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM besides the following the comments:

CHANGES.rst Outdated
Thomas Moreau

Limit the number of threads used in worker processes (for ``loky`` and
``multiprocessing``) by C-libraries that relies on threadpools. This
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not mention multiprocessing as it's not reliable when using the default fork context under posix.

CHANGES.rst Outdated
Implement the ``loky`` backend with @ogrisel. This backend relies on
a robust implementation of ``concurrent.futures.ProcessPoolExecutor``
with spawned processes that can be reused accross the ``Parallel``
calls.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this to the top and explain that it fixes bad interactions (crash) with the internal thread-pool of third-party libraries with a link to.

https://pythonhosted.org/joblib/parallel.html#bad-interaction-of-multiprocessing-and-third-party-libraries

doc/parallel.rst Outdated
computational power available for each process. Moreover, when many processes
are running, the time taken by the OS scheduler to switch between them can
further hinder the performance of the computation. It is generally better to
avoid using more ``Process/Thread`` than the number of CPU on a machine.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid using significantly more processes or threads than the number of CPUs on a machine.

doc/parallel.rst Outdated
further hinder the performance of the computation. It is generally better to
avoid using more ``Process/Thread`` than the number of CPU on a machine.

Some third-partiy libraries -- *e.g.* ``numpy`` -- manage internally a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. the BLAS runtime used by numpy

doc/parallel.rst Outdated
avoid using more ``Process/Thread`` than the number of CPU on a machine.

Some third-partiy libraries -- *e.g.* ``numpy`` -- manage internally a
thread-pool to perform their computations. The default behavior is generaly to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally

@tomMoral tomMoral merged commit 3cfb821 into joblib:master Jun 22, 2018
@tomMoral tomMoral deleted the PR_limit_threads_omp branch June 22, 2018 10:01
yarikoptic added a commit to yarikoptic/joblib that referenced this pull request Jul 28, 2018
* tag '0.12': (116 commits)
  Release 0.12
  typo
  typo
  typo
  ENH add initializer limiting n_threads for C-libs (joblib#701)
  DOC better parallel docstring (joblib#704)
  [MRG] Nested parallel call thread bomb mitigation (joblib#700)
  MTN vendor loky2.1.3 (joblib#699)
  Make it possible to configure the reusable executor workers timeout (joblib#698)
  MAINT increase timeouts to make test more robust on travis
  DOC: use the .joblib extension instead of .pkl (joblib#697)
  [MRG] Fix exception handling in nested parallel calls (joblib#696)
  Fix skip test lz4 not installed (joblib#695)
  [MRG] numpy_pickle:  several enhancements (joblib#626)
  Introduce Parallel.__call__ backend callbacks (joblib#689)
  Add distributed on readthedocs (joblib#686)
  Support registration of external backends (joblib#655)
  [MRG] Add a dask.distributed example (joblib#613)
  ENH use cloudpickle to pickle interactively defined callable (joblib#677)
  CI freeze the version of sklearn0.19.1 and scipy1.0.1 (joblib#685)
  ...
yarikoptic added a commit to yarikoptic/joblib that referenced this pull request Jul 28, 2018
* releases: (121 commits)
  Release 0.12.1
  fix kwonlydefaults key error in filter_args (joblib#715)
  MNT fix some "undefined name" flake8 warnings (joblib#713).
  from importlib import reload for Python 3 (joblib#675)
  MTN vendor loky2.1.4 (joblib#708)
  Release 0.12
  typo
  typo
  typo
  ENH add initializer limiting n_threads for C-libs (joblib#701)
  DOC better parallel docstring (joblib#704)
  [MRG] Nested parallel call thread bomb mitigation (joblib#700)
  MTN vendor loky2.1.3 (joblib#699)
  Make it possible to configure the reusable executor workers timeout (joblib#698)
  MAINT increase timeouts to make test more robust on travis
  DOC: use the .joblib extension instead of .pkl (joblib#697)
  [MRG] Fix exception handling in nested parallel calls (joblib#696)
  Fix skip test lz4 not installed (joblib#695)
  [MRG] numpy_pickle:  several enhancements (joblib#626)
  Introduce Parallel.__call__ backend callbacks (joblib#689)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants