Tokeniser: Tag attributes that follow '<' character in attribute name are lost

The Tokeniser logic for parsing attribute names considers a '<' character to be the end of the tag. This is not consistent with the way the browsers engines that I tested on MacOS (Brave/Chrome, Safari, Firefox) handle this case.

As demonstrated here: http://try.jsoup.org/~X8uusGL-o4nn_aiT4XVefMuXW0Q

Consider the tag`<a before="foo" <junk after="page.html">`.

In this case, jsoup will associate the `before` attribute with the `a` tag. It will then process `<junk` as a new tag and associate the `after` attribute with it.

Handling more consistently with browsers might assign the unvalued attribute `"<junk"` to the `a` tag and continue processing additional attributes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tokeniser: Tag attributes that follow '<' character in attribute name are lost #1483

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Tokeniser: Tag attributes that follow '<' character in attribute name are lost #1483

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions