Skip to content

Difference from 1.19.1 to 1.20.1 in iframe and escaping behavior when cleaning html #2326

@justone

Description

@justone

I noticed a parsing difference when it came to iframe child content between 1.19.1 to 1.20.1. I don't know which behavior is correct.

Given this html fragment:

<blockquote>
      <p>
        Some content
      <script>alert('script outside iframe, oh noes!');</script></p>
      <iframe>Content inside iframe<script>alert('script between iframe, oh noes!');</script></iframe>
    </blockquote>

When cleaned with String safeHtml = Jsoup.clean(unsafeHtml, Safelist.relaxed());, the following is the result:

1.19.1:

<blockquote>
 <p>Some content</p>Content inside iframe &lt;script&gt;alert('script between iframe, oh noes!');&lt;/script&gt;
</blockquote>

1.20.1:

<blockquote>
 <p>Some content</p>
</blockquote>

I used git bisect and the commit where the behavior was introduced is 3704342, where the pretty printer was rewritten.

Is the iframe removal the right behavior?
Is there a way to replicate the old behavior, with the child content getting escaped?

Thank you for any help or insight.

Metadata

Metadata

Assignees

No one assigned

    Labels

    improvementAn improvement / new feature idea

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions