Skip to content

Using generator with process_map crashes, despite passing total parameter to tqdm_class #1231

@mlissner

Description

@mlissner

When using process_map on a generator, it crashes if len can't be called. For example:

cd /var/www/cl/docker/nginx/ && sudo docker-compose exec cl-python /opt/courtlistener/manage.py  dump_anon_docket_html --offset 74850 --processes 4 ; bush "Done making html"
  0%| | 0/2703724 [0Traceback (most recent call last):
  File "/opt/courtlistener/manage.py", line 15, in <module>
    main()
  File "/opt/courtlistener/manage.py", line 11, in main
    execute_from_command_line(sys.argv)
  File "/usr/local/lib/python3.8/site-packages/django/core/management/__init__.py", line 401, in execute_from_command_line
    utility.execute()
  File "/usr/local/lib/python3.8/site-packages/django/core/management/__init__.py", line 395, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/local/lib/python3.8/site-packages/django/core/management/base.py", line 330, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/usr/local/lib/python3.8/site-packages/django/core/management/base.py", line 371, in execute
    output = self.handle(*args, **options)
  File "/opt/courtlistener/cl/corpus_importer/management/commands/dump_anon_docket_html.py", line 68, in handle
    make_html(options)
  File "/opt/courtlistener/cl/corpus_importer/management/commands/dump_anon_docket_html.py", line 40, in make_html
    process_map(
  File "/usr/local/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 130, in process_map
    return _executor_map(ProcessPoolExecutor, fn, *iterables, **tqdm_kwargs)
  File "/usr/local/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 61, in _executor_map
    kwargs["total"] = len(iterables[0])
TypeError: object of type 'generator' has no len()
Sentry is attempting to send 1 pending error messages
Waiting up to 2 seconds
Press Ctrl-C to quit

That's a lot, here's the important part:

    kwargs["total"] = len(iterables[0])
TypeError: object of type 'generator' has no len()

What's funny about that is that I passed the total parameter to my tqdm_class parameter:

    progress_bar = tqdm(
        total=total,
    )
    process_map(
        _write_item_to_disk,
        objects,
        max_workers=options["processes"],
        tqdm_class=progress_bar,
        total=total,
    )

It makes sense when you look in the code though. There, you find that the process_map calls _executor_map with whatever tqdm_kwargs it gets:

    return _executor_map(ProcessPoolExecutor, fn, *iterables, **tqdm_kwargs)

And _executor_map does:

    if "total" not in kwargs:
        kwargs["total"] = len(iterables[0])

Which crashes.

The fix is probably either:

  1. If there's a tqdm_class parameter, _executor_map should use the total parameter it provides, instead of using the kwarg provided to process_map; or

  2. The documentation should be updated to note that the total parameter is a supported argument in the process_map function.

I think number 2 is easier (no code change), but number 1 makes more sense because it provides a simpler API for how to set the total parameter. You could do a combination of the two also:

  1. Update the docs to note that total can be provided to the process_map, but prefer the total parameter that is passed to the tqdm_class class, if there is one.

Version info:
4.59.0 3.8.8 (default, Mar 12 2021, 19:44:18)
[GCC 8.3.0] linux

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions