-
-
Notifications
You must be signed in to change notification settings - Fork 98
Description
Problem
I have been following this TIL to run the Readability.js
on a page with Shot Scraper.
https://til.simonwillison.net/shot-scraper/readability
This worked fine for pages with liberal content security policies, however when tried to scrape a page with a stronger CSP I ran across this error:
Refused to load the script 'https://cdn.skypack.dev/@mozilla/readability' because it violates the following Content Security Policy directive: …
When a page has a strong CSP like this it limits the ability for Shot Scraper to run Javascript on a page before processing it.
Suggestion
The Playwright Python tools have an optional bypass_csp
argument that can be passed to the new_context
method.
As a test I monkey-patched shot_scraper/cli.py
with the following:
# cli.py, line 353
...
context_args["bypass_csp"] = True # <-- Line added
context = browser_obj.new_context(**context_args)
...
And now the Readability.js
script executes without a problem. :)
It would be really useful to give Shot Scraper a CLI argument like --bypass-csp
that would then optionally add this argument in Playwright and allow more flexibility to run javascript on pages like this.
Thank you for a great tool!