Here is an excerpt from the html I want to scan through. <div class=text>

Question

Asked: May 29, 20262026-05-29T09:50:33+00:00 2026-05-29T09:50:33+00:00

Here is an excerpt from the html I want to scan through.

<div class="text">
 <h3>
  <a href="http://www.faith.co.uk/">
   Rodeo Sinclair
  </a>
 </h3>

And here is my ruby code.

@doc = open(url) { |f| 
  @doc = f.read
}

output = @doc.scan(/<h3><a href=(.*?)>/)

This does not work because of the new lines and spaces in the html file. Is there anyway I can get around this?

You must login to add an answer.

Need An Account,

Editorial Team · Answer 1 · 2026-05-29T09:50:34+00:00

I could easily create a regular expression that would parse your HTML fragment.

However, I would like to encourage you to get in the habit of using an XML/HTML parser to interact with HTML.

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open(url))

output = doc.css('div h3 a').each do |link|
    puts link.attr("href")
end

See RegEx match open tags except XHTML self-contained tags for a compelling argument against using regular expressions to parse HTML.

==EDIT== changed to an each loop

The Archive Base Latest Questions