Skip to content

Tokeniser: Tag attributes that follow '<' character in attribute name are lost #1483

@jmeckman

Description

@jmeckman

The Tokeniser logic for parsing attribute names considers a '<' character to be the end of the tag. This is not consistent with the way the browsers engines that I tested on MacOS (Brave/Chrome, Safari, Firefox) handle this case.

As demonstrated here: http://try.jsoup.org/~X8uusGL-o4nn_aiT4XVefMuXW0Q

Consider the tag<a before="foo" <junk after="page.html">.

In this case, jsoup will associate the before attribute with the a tag. It will then process <junk as a new tag and associate the after attribute with it.

Handling more consistently with browsers might assign the unvalued attribute "<junk" to the a tag and continue processing additional attributes.

Metadata

Metadata

Assignees

Labels

fixedAn {bug|improvement} that has been {fixed|implemented}

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions