Skip to content

Conversation

jhl221123
Copy link
Contributor

@jhl221123 jhl221123 commented Jun 9, 2025

What's changed?

This change updates the logic in visitDoctypedecl to correctly create Xml.Comment objects for comments found within a doctype's internal subset. All other markup, such as entity declarations, is parsed as Xml.CharData to preserve its raw text.

What's your motivation?

Checklist

  • I've added unit tests to cover both positive and negative cases
  • I've read and applied the recipe conventions and best practices
  • I've used the IntelliJ IDEA auto-formatter on affected files

@timtebeek timtebeek requested a review from greg-at-moderne June 9, 2025 17:55
@@ -740,7 +740,7 @@ public String getPrefix() {
}

Markers markers;
List<Element> elements;
List<Content> elements;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slight worry here about deserialization, but I'm not on my laptop to check.. Pointing it out here for a more thorough review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Indeed, types Element and Content seem unrelated (other than descending from Xml).
  • But I agree, the external subsets shouldn't be modelled as a collection of elements. These are not elements from logical perspective.
  • Having said that, I think we should go for it. As far as I understand,tThe failure scenario is that for some XML files which use these declarations, their LSTs will no longer match until re-built.

All in all - let's try.

@timtebeek
Copy link
Member

Thanks a lot for the help here @jhl221123 ! I've tagged Greg for an additional review of the concern I posted above.

@timtebeek timtebeek added bug Something isn't working xml labels Jun 9, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Ready to Review in OpenRewrite Jun 10, 2025
@greg-at-moderne
Copy link
Contributor

Thank you so much for the contribution, @jhl221123.
I appreciate the thorough test case provided.

@greg-at-moderne greg-at-moderne merged commit ebf4d2b into openrewrite:main Jun 10, 2025
2 checks passed
@github-project-automation github-project-automation bot moved this from Ready to Review to Done in OpenRewrite Jun 10, 2025
@knutwannheden
Copy link
Contributor

Something seems wrong here. Note that Xml.Element isn't a normal XML element. For that we have Xml.Tag. Perhaps it is meant to correspond to https://www.w3.org/TR/xml/#NT-elementdecl?

AFAICT, we now no longer create any Xml.Element instances anywhere. That would mean that any recipes which operate on these or on owned objects would be broken now. Can we check this again?

greg-at-moderne added a commit that referenced this pull request Jun 10, 2025
greg-at-moderne added a commit that referenced this pull request Jun 10, 2025
JohannisK pushed a commit that referenced this pull request Jun 10, 2025
Co-authored-by: Tim te Beek <tim@moderne.io>
JohannisK pushed a commit that referenced this pull request Jun 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working xml
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

XML parser parses comments inside doctype subset into Xml.Ident
4 participants