I am trying to get the first image <img> closest to the first <p> tag of a webpage using Nokogiri. I will be using the results to display the article synopsis a la the Facebook share link.
The code I am using to get the first <p> tag of an article is as follows:
doc = Nokogiri::HTML(open(@url))
@title = doc.css('title').text
@content = doc.css('p').first
Find the first
<img>that is inside a<p>If you don’t already have/need the
<p>element, either:Note that instead of
at_cssorat_xpathyou can just useatand let Nokogiri figure out from the string if it is a CSS or XPath expression.Find the first
<img>that is inside the first<p>If you already have the parent node, you can use either of these:
However, with these answers (unlike the first two) if the first p does not have an image you won’t find any image at all.
Find the first
<img>in the documentIf you really just want the first
<img>anywhere (which might not be in a<p>, or the first<p>) then simply do:If you want the first image that has at least one
<p>occurring in the document somewhere before it, but not necessarily as a wrapper for the<img>…then say so and I can edit the answer further.Find the first
<img>that has a<p>before it (or as an ancestor)Edit: Based on your comment below, I think you want:
This says “Find the first
<img>in the document that either has a<p>occurring somewhere before it (but not an ancestor), or that has as an ancestor<p>.”Here are some test cases so you can decide if this is what you want: