I need to move away from using xsltproc command-line tools for deployment on Heroku, since they don’t really support it. The Nokogiri gem looks like it should work for everything I need, although I am having trouble finding a representative image from HTML.
What I mean by representative image is, the first of all images under /html/body that have “://” in the link and don’t have “ads.” or “ad.” or “?” in the link. Is there a Nokogiri function that will do this, possibly returning an array of all images, and I can filter them how I want?
The following XPath should select the image that meets your stated criteria:
You can use it like this: