I am trying to scrape an HTML page for a particular input field (so that I can extract a token from it for use during login). I’m using SBCL 1.0.54 (because that version works properly with StumpWM), quicklisp, and the following quicklisp packages:
drakma
closure-html
cxml-stp
If I load the HTML page using Drakma, and convert it to valid X(HTML), I can use the following code (loosely adapted frome the Plexippus XPath examples):
(xpath:do-node-set (node (xpath:evaluate "//*" xhtml-tree))
(format t "found element: ~A~%"
(xpath-protocol:local-name node)))
… to obtain the following results (snipped for brevity; the page in question is large):
found element: img
found element: a
found element: img
found element: script
found element: div
found element: img
found element: a
found element: input
found element: input
However I can’t seem to get any XPath statement more complicated than “//*” working correctly. My aim is to find an input with a particular name, but even just finding all inputs fails:
* (xpath:evaluate "//input" xhtml-tree)
#<XPATH:NODE-SET empty {10087146F3}>
I’m obviously missing something pretty basic here. Could someone please give me pointer in the right direction?
Could it be a namespace issue? That is, if there is an
xmlnsattribute on the roothtmlelement, then you will need to declare the namespace withxpath:with-namespacesand specify it in your XPath expression. The expression"//input"only findsinputelements that aren’t in any namespace.