Skip to content

The trailing ? in <?xml version="1.0"?> emits an error #2298

@smileLilith

Description

@smileLilith

After upgrading from v1.17.1 to v1.19.1 unit tests started to fail on parsing XML files.

Valid XML file (minimum reproducible, not entire file)

<?xml version="1.0"?>
<catalogs xmlns="http://acalog.com/catalog/1.0" xmlns:h="http://www.w3.org/1999/xhtml"
          xmlns:a="http://www.w3.org/2005/Atom" xmlns:xi="http://www.w3.org/2001/XInclude">
</catalogs>

The error is

Unexpected character '?' in input state [AfterAttributeValue_quoted]

Please note that there is no whitespace between " and ?> in the first line of XML. Once the whitespace is added no parsing error is returned. Valid beginning of file based on Jsoup XML parser <?xml version="1.0" ?>.

Usage

Parser parser = Parser.xmlParser().setTrackErrors(1).newInstance();
Document httpDoc = Jsoup.parse(fileContent, "", parser);
if (!parser.getErrors().isEmpty()) {
    throw new IllegalArgumentException(String.format("Not a valid XML. Error: %s", parser.getErrors()));
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugA confirmed bug, that we should fixfixedAn {bug|improvement} that has been {fixed|implemented}

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions