I need to extract all links from a html document having text as the inner element and not a reference to an image. Basically I would like to do a doc.select(“//a/attribute::href”) for all elements in a tree where doc.select(“//a/text()”) returns anything. Thanks!
Share
Well you can write conditions in XPath in a predicate in square brackets, e.g.
//a[text()]/@hrefselects thehrefattributes of all link (a) elements that have at least one text node child. Or if you want to make sure there is noimgchild element in the link you can use e.g.//a[not(img)]/@href.