Skip to content

Conversation

mislav
Copy link
Contributor

@mislav mislav commented Aug 17, 2025

The expression /ancestor::div[1] should select only the first "div" node while traversing up the document tree, but this wasn't the case.

The bug was discovered by bannmann and reported to the Readeck project. Readeck uses XPath expressions to selectively strip parts of HTML documents, but this bug causes significantly larger portions of some documents be wiped due to the overzealous ancestor matching.

This fixes positional predicates by having ancestorQuery implement the Position interface.

The test was written by hand by me and the implementation (plus code comments) was helped by GitHub Copilot w/ GPT-5.

I made a new test helper createElement to simplify making nested documents for testing, since the pre-existing trees used in other tests are not suitable for verifying this bug due to not having deep nesting with repeated element names.

The expression `/ancestor::div[1]` should select only the first "div" node
while traversing up the document tree, but this wasn't the case.

This fixes positional predicates by having ancestorQuery implement the Position
interface.

The test was written by hand by me and the implementation was helped by GitHub
Copilot w/ GPT-5.
@zhengchun zhengchun merged commit 511abd5 into antchfx:master Aug 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants