Skip to content

Conversation

HorlogeSkynet
Copy link
Owner

@HorlogeSkynet HorlogeSkynet commented May 22, 2020

This patch is mainly inspired from the work of @ingrinder (see 2bbc2dae).
You may notice an execution up to twice as fast.

This behavior could be disabled with the new parallel_loading configuration option.

See #68.

How has this been tested ?

Locally & test cases.

Types of changes :

  • New feature (non-breaking change which adds functionality)
  • Breaking change [?] (fix or feature that would cause existing functionality to change)

Checklist :

  • [IF NEEDED] I have updated the README.md file accordingly ;
  • [ ] [IF NEEDED] I have updated the test cases (which pass) accordingly ;
  • My changes looks good ;
  • I agree that my code may be modified in the future ;
  • My code follows the code style of this project (PEP8).

This patch is mainly inspired from the work of @inrinder (see `2bbc2dae`).
You may notice an execution up to twice as fast.

This new behavior could be disabled with the new `parallel_loading` configuration option.

Co-authored-by: Michael Bromilow <12384431+ingrinder@users.noreply.github.com>
@HorlogeSkynet HorlogeSkynet added the enhancement ⬆️ Implements a new feature, fixes or improves existing ones label May 22, 2020
@HorlogeSkynet HorlogeSkynet added this to the v4.8.0 milestone May 22, 2020
@HorlogeSkynet HorlogeSkynet requested a review from ingrinder May 22, 2020 10:26
@HorlogeSkynet HorlogeSkynet self-assigned this May 22, 2020
@HorlogeSkynet HorlogeSkynet linked an issue May 22, 2020 that may be closed by this pull request
@ingrinder
Copy link
Collaborator

Since we're executing I/O-heavy work, should we use len(Entries) (or maybe a static value, so 18 currently), as our ThreadPoolExecutor's maxcount? This would mean that in a case where nearly all entries were blocking on I/O, we'd still get at least another thread for another entry to begin its processing while waiting on the others - which should result in the quickest possible execution time. We don't really do anything heavy which is avoided by not spawning threads, and the improvements in Python 3.8 and above also mean we don't spawn more threads than ever necessary to complete all of the work in parallel.

@HorlogeSkynet
Copy link
Owner Author

HorlogeSkynet commented May 24, 2020

You're actually right 👍
So what about mixing our needs with "new" Python 3.8+ behavior ?

            executor = cm_stack.enter_context(
-                ThreadPoolExecutor(max_workers=((os.cpu_count() or 1) * 5))
+                ThreadPoolExecutor(max_workers=min(len(enabled_entries), (os.cpu_count() or 1) + 4))
            )

Reference : python/cpython#13618

ingrinder
ingrinder previously approved these changes May 26, 2020
Copy link
Collaborator

@ingrinder ingrinder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried that limiting threads based on CPU count could slow down execution for the longer-running entries, however after some testing (and fun playing with matplotlib! 😆) on an older Core 2 Duo machine it looks like it's just fine 👍.

@ingrinder ingrinder dismissed their stale review May 26, 2020 02:00

Just a moment - just noticed an (admittedly stupid) edge case we should deal with.

...maybe *someone* just wants the logo?
@HorlogeSkynet
Copy link
Owner Author

Nice catch ! Thanks 🙂

@HorlogeSkynet HorlogeSkynet merged commit f74dda9 into master May 26, 2020
@HorlogeSkynet HorlogeSkynet deleted the feature/multithreading branch May 26, 2020 07:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement ⬆️ Implements a new feature, fixes or improves existing ones
Projects
Status: DONE
Development

Successfully merging this pull request may close these issues.

[QUESTION] [FEATURE] Multithreading?
2 participants