You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Tokeniser logic for parsing attribute names considers a '<' character to be the end of the tag. This is not consistent with the way the browsers engines that I tested on MacOS (Brave/Chrome, Safari, Firefox) handle this case.
Consider the tag<a before="foo" <junk after="page.html">.
In this case, jsoup will associate the before attribute with the a tag. It will then process <junk as a new tag and associate the after attribute with it.
Handling more consistently with browsers might assign the unvalued attribute "<junk" to the a tag and continue processing additional attributes.