-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Closed
Labels
improvementAn improvement / new feature ideaAn improvement / new feature idea
Milestone
Description
I noticed a parsing difference when it came to iframe child content between 1.19.1 to 1.20.1. I don't know which behavior is correct.
Given this html fragment:
<blockquote>
<p>
Some content
<script>alert('script outside iframe, oh noes!');</script></p>
<iframe>Content inside iframe<script>alert('script between iframe, oh noes!');</script></iframe>
</blockquote>
When cleaned with String safeHtml = Jsoup.clean(unsafeHtml, Safelist.relaxed());
, the following is the result:
1.19.1:
<blockquote>
<p>Some content</p>Content inside iframe <script>alert('script between iframe, oh noes!');</script>
</blockquote>
1.20.1:
<blockquote>
<p>Some content</p>
</blockquote>
I used git bisect
and the commit where the behavior was introduced is 3704342, where the pretty printer was rewritten.
Is the iframe removal the right behavior?
Is there a way to replicate the old behavior, with the child content getting escaped?
Thank you for any help or insight.
Metadata
Metadata
Assignees
Labels
improvementAn improvement / new feature ideaAn improvement / new feature idea