-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
When using process_map on a generator, it crashes if len
can't be called. For example:
↪ cd /var/www/cl/docker/nginx/ && sudo docker-compose exec cl-python /opt/courtlistener/manage.py dump_anon_docket_html --offset 74850 --processes 4 ; bush "Done making html"
0%| | 0/2703724 [0Traceback (most recent call last):
File "/opt/courtlistener/manage.py", line 15, in <module>
main()
File "/opt/courtlistener/manage.py", line 11, in main
execute_from_command_line(sys.argv)
File "/usr/local/lib/python3.8/site-packages/django/core/management/__init__.py", line 401, in execute_from_command_line
utility.execute()
File "/usr/local/lib/python3.8/site-packages/django/core/management/__init__.py", line 395, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/usr/local/lib/python3.8/site-packages/django/core/management/base.py", line 330, in run_from_argv
self.execute(*args, **cmd_options)
File "/usr/local/lib/python3.8/site-packages/django/core/management/base.py", line 371, in execute
output = self.handle(*args, **options)
File "/opt/courtlistener/cl/corpus_importer/management/commands/dump_anon_docket_html.py", line 68, in handle
make_html(options)
File "/opt/courtlistener/cl/corpus_importer/management/commands/dump_anon_docket_html.py", line 40, in make_html
process_map(
File "/usr/local/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 130, in process_map
return _executor_map(ProcessPoolExecutor, fn, *iterables, **tqdm_kwargs)
File "/usr/local/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 61, in _executor_map
kwargs["total"] = len(iterables[0])
TypeError: object of type 'generator' has no len()
Sentry is attempting to send 1 pending error messages
Waiting up to 2 seconds
Press Ctrl-C to quit
That's a lot, here's the important part:
kwargs["total"] = len(iterables[0])
TypeError: object of type 'generator' has no len()
What's funny about that is that I passed the total parameter to my tqdm_class
parameter:
progress_bar = tqdm(
total=total,
)
process_map(
_write_item_to_disk,
objects,
max_workers=options["processes"],
tqdm_class=progress_bar,
total=total,
)
It makes sense when you look in the code though. There, you find that the process_map
calls _executor_map
with whatever tqdm_kwargs it gets:
return _executor_map(ProcessPoolExecutor, fn, *iterables, **tqdm_kwargs)
And _executor_map
does:
if "total" not in kwargs:
kwargs["total"] = len(iterables[0])
Which crashes.
The fix is probably either:
-
If there's a
tqdm_class
parameter,_executor_map
should use thetotal
parameter it provides, instead of using the kwarg provided toprocess_map
; or -
The documentation should be updated to note that the
total
parameter is a supported argument in theprocess_map
function.
I think number 2 is easier (no code change), but number 1 makes more sense because it provides a simpler API for how to set the total
parameter. You could do a combination of the two also:
- Update the docs to note that total can be provided to the
process_map
, but prefer thetotal
parameter that is passed to thetqdm_class
class, if there is one.
Version info:
4.59.0 3.8.8 (default, Mar 12 2021, 19:44:18)
[GCC 8.3.0] linux