I’m having an issue with using XPath contains text with dom4j.
Given the following XML:
ABC
5
BLAH BLAH BLAH
ABC
I want to find all nodes containing the text “ABC”. Based on my understanding, the XPath query should be:
//*[contains(text(),‘ABC’)]
However, this query only returns the element and not the element. The DOM treats the element as a composite element with the structure:
[Text = 'BLAH BLAH BLAH '][BR][BR][Text = ‘ABC’]
I assumed that the query would still find the element since it contains the text “ABC”, but it doesn’t.
A similar query returns the element but also includes the parent elements, which is not desired:
//*[contains(text(),‘ABC’)]
Does anyone know the correct XPath query that will return only the and elements, without including the parent elements?
Hey Shreashth ,
Xpaths can be challenging and sometimes even more time-consuming to trace the exact element. But for your query, you can use Multiple Contains in XPath . I will show you how.
- Using Multiple Contains in XPath : This approach leverages XPath’s ability to check for text content within elements, ensuring that it captures elements with composite text nodes. You can modify your XPath query to check for text nodes within elements that contain “ABC”. This approach ensures that it captures elements with composite text nodes.
import org.dom4j.*;
import org.dom4j.io.SAXReader;
import java.util.List;
public class XPathContainsExample {
public static void main(String[] args) throws DocumentException {
SAXReader reader = new SAXReader();
Document document = reader.read("your-file.xml");
List<Node> nodes = document.selectNodes("//*[text()[contains(., 'ABC')]]");
for (Node node : nodes) {
System.out.println(node.asXML());
}
}
}
Hope this helps and I was able to explain you .
Hey Mark,
Using XPath with normalize-space() : The normalize-space() function helps handle elements with mixed content by trimming whitespace and making the contains() function more robust, normalize-space() function trims whitespace and can be combined with contains() to handle elements with mixed content.
import org.dom4j.*;
import org.dom4j.io.SAXReader;
import java.util.List;
public class XPathNormalizeSpaceExample {
public static void main(String[] args) throws DocumentException {
SAXReader reader = new SAXReader();
Document document = reader.read("your-file.xml");
List<Node> nodes = document.selectNodes("//*[contains(normalize-space(text()), 'ABC')]");
for (Node node : nodes) {
System.out.println(node.asXML());
}
}
}
Hey Shreshth,
Iterate and Check Text Nodes: This method manually traverses the XML document, providing fine-grained control over which nodes are checked, ensuring accuracy in identifying " ABC " elements.
Manually iterate through elements and check each text node for “ABC”. This method is more verbose but ensures precise control over which nodes are checked.
import org.dom4j.*;
import org.dom4j.io.SAXReader;
import java.util.List;
public class XPathManualCheckExample {
public static void main(String[] args) throws DocumentException {
SAXReader reader = new SAXReader();
Document document = reader.read("your-file.xml");
List<Element> elements = document.getRootElement().elements();
for (Element element : elements) {
checkAndPrint(element);
}
}
private static void checkAndPrint(Element element) {
for (int i = 0; i < element.nodeCount(); i++) {
Node node = element.node(i);
if (node instanceof Text && node.getText().contains("ABC")) {
System.out.println(element.asXML());
break;
}
if (node instanceof Element) {
checkAndPrint((Element) node);
}
}
}
}