Skip to content

Optionally disable content security policies #114

@jamesking

Description

@jamesking

Problem

I have been following this TIL to run the Readability.js on a page with Shot Scraper.

https://til.simonwillison.net/shot-scraper/readability

This worked fine for pages with liberal content security policies, however when tried to scrape a page with a stronger CSP I ran across this error:

Refused to load the script 'https://cdn.skypack.dev/@mozilla/readability' because it violates the following Content Security Policy directive: …

When a page has a strong CSP like this it limits the ability for Shot Scraper to run Javascript on a page before processing it.

Suggestion

The Playwright Python tools have an optional bypass_csp argument that can be passed to the new_context method.

As a test I monkey-patched shot_scraper/cli.py with the following:

# cli.py, line 353
...
context_args["bypass_csp"] = True # <-- Line added
context = browser_obj.new_context(**context_args)
...

And now the Readability.js script executes without a problem. :)

It would be really useful to give Shot Scraper a CLI argument like --bypass-csp that would then optionally add this argument in Playwright and allow more flexibility to run javascript on pages like this.

Thank you for a great tool!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions