-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Closed
Labels
bugA confirmed bug, that we should fixA confirmed bug, that we should fixfixedAn {bug|improvement} that has been {fixed|implemented}An {bug|improvement} that has been {fixed|implemented}
Milestone
Description
Hello, please see below a test program that tries to extract the text node range positions from the malformed fragment foo<p/>far
. Notice the malformed tag <p/>
.
import org.jsoup.nodes.*;
import org.jsoup.parser.*;
import org.jsoup.select.*;
public class Test {
public static void main(String[] args) {
HtmlTreeBuilder treeBuilder = new HtmlTreeBuilder();
Parser parser = new Parser(treeBuilder);
parser.setTrackPosition(true);
Document document = parser.parseInput("foo<p/>bar", "");
NodeTraversor.traverse((Node node, int depth) -> {
if (node instanceof TextNode textNode) {
Range sourceRange = textNode.sourceRange();
System.out.printf("text=%s start=%d end=%d%n",
textNode.text(),
sourceRange.start().pos(),
sourceRange.end().pos());
}
}, document);
}
}
With release 1.16.1, all positions are negative:
% java -cp ~/.m2/repository/org/jsoup/jsoup/1.16.1/jsoup-1.16.1.jar Test.java
text=foo start=-1 end=-1
text=bar start=-1 end=-1
With release 1.18.1, it's a little better, except for the -1
start position for the bar
text immediately following the malformed tag.
% java -cp ~/.m2/repository/org/jsoup/jsoup/1.18.1/jsoup-1.18.1.jar Test.java
text=foo start=0 end=3
text=bar start=-1 end=10
Metadata
Metadata
Assignees
Labels
bugA confirmed bug, that we should fixA confirmed bug, that we should fixfixedAn {bug|improvement} that has been {fixed|implemented}An {bug|improvement} that has been {fixed|implemented}