Skip to content

Saving large JSON HAR files causes crashes #170

@alysbrooks

Description

@alysbrooks

When saving around 70 or so pages using shot-scraper multi --har-file, it fails with the error "Invalid string length".

I realize my use case is quite a bit outside the norm, volume-wise, but I thought I'd report it.

Using compressed HAR files avoids the issue, so I would suggest documenting this as a limitation of saving uncompressed HARs.

Below is the traceback:

Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.10.16/x64/bin/shot-scraper", line 8, in <module>
    sys.exit(cli())
  File "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
  File "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/shot_scraper/cli.py", line 628, in multi
    context.close()
  File "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/playwright/sync_api/_generated.py", line 13474, in close
    return mapping.from_maybe_impl(self._sync(self._impl_obj.close(reason=reason)))
  File "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/playwright/_impl/_sync_base.py", line 115, in _sync
    return task.result()
  File "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/playwright/_impl/_browser_context.py", line 598, in close
    await self._channel._connection.wrap_api_call(_inner_close, True)
  File "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 528, in wrap_api_call
    raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None
playwright._impl._errors.Error: : Invalid string length
Error: Process completed with exit code 1.

You can view this in the context of the job on GitHub actions.

Lastly, thank you for shot-scraper and shot-scraper-template!

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions