Here is an excerpt from the html I want to scan through.
<div class="text">
<h3>
<a href="http://www.faith.co.uk/">
Rodeo Sinclair
</a>
</h3>
And here is my ruby code.
@doc = open(url) { |f|
@doc = f.read
}
output = @doc.scan(/<h3><a href=(.*?)>/)
This does not work because of the new lines and spaces in the html file. Is there anyway I can get around this?
I could easily create a regular expression that would parse your HTML fragment.
However, I would like to encourage you to get in the habit of using an XML/HTML parser to interact with HTML.
See RegEx match open tags except XHTML self-contained tags for a compelling argument against using regular expressions to parse HTML.
==EDIT== changed to an each loop