LimitExceededException when calling the StartDocumentTextDetection operation

Thank you for building this incredibly useful tool! I've found a lot of use for it recently, but I think I may have pushed it a bit beyond the scale it's built for.

I ran the line you included in the demo (`s3-ocr start s3-ocr-demo --all -a ocr.json`) on an S3 bucket that contains ~2,500 PDFs. It started Textract jobs for the first 102 PDFs in the bucket, but then it raised the following exception:

```
Traceback (most recent call last):
  File "/home/ethan/miniconda3/envs/nj_deaths/bin/s3-ocr", line 8, in <module>
    sys.exit(cli())
  File "/home/ethan/miniconda3/envs/nj_deaths/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/ethan/miniconda3/envs/nj_deaths/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/ethan/miniconda3/envs/nj_deaths/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ethan/miniconda3/envs/nj_deaths/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ethan/miniconda3/envs/nj_deaths/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/ethan/miniconda3/envs/nj_deaths/lib/python3.10/site-packages/s3_ocr/cli.py", line 137, in start
    response = textract.start_document_text_detection(
  File "/home/ethan/miniconda3/envs/nj_deaths/lib/python3.10/site-packages/botocore/client.py", line 508, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/ethan/miniconda3/envs/nj_deaths/lib/python3.10/site-packages/botocore/client.py", line 915, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.LimitExceededException: An error occurred (LimitExceededException) when calling the StartDocumentTextDetection operation: Open jobs exceed maximum concurrent job limit
```

While it's fairly clear what caused the exception (running too many jobs at once), there's no obvious way to avoid it—aside from, of course, OCRing fewer PDFs at once, but who wants to do that?!

Is there a way to tell s3-ocr to chunk the jobs so that jobs that exceed the limit are queued to wait until the other jobs finish?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

LimitExceededException when calling the StartDocumentTextDetection operation #21

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

LimitExceededException when calling the StartDocumentTextDetection operation #21

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions