Skip to content

poor performance in DOMDifferenceEngine for large XMLs #236

@gerpres

Description

@gerpres

I'm comparing two large XMLs.
One parent element contains around 45000 children.

the multiple List.indexOf()-calls in

org.xmlunit.diff.DOMDifferenceEngine.compareNodeLists(Iterable<Node>, XPathContext, Iterable<Node>, XPathContext)

are quite expensive, since all lists contain 45000 elements, and should be replaced by a more performant data-structure.
since the data-structure seems to be 'immutable' for the matches-loop, constructing multiple Map<Node,Integer> instances that contain the indizes, cuts the required comparison-time in half for my local tests.

private static <E> Map<E, Integer> index(Collection<E> collection) {
	Map<E, Integer> indizes = new HashMap<>();

	int i = 0;
	for (E e: collection) {
		indizes.put(e, i++);
	}

	return indizes;
}

and use it like:

private ComparisonState compareNodeLists(Iterable<Node> controlSeq, final XPathContext controlContext, Iterable<Node> testSeq, final XPathContext testContext) {
   ...
   Map<Node, Integer> controlListIndizes = index(controlList);
   ...
   for (Map.Entry<Node, Node> pair: matches) {
      ...
      int controlIndex = controlListIndizes.get(control);
      ...
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions