I try to extract all five rows listed in the table above. I’m using

Question

0

Editorial Team

Asked: May 27, 20262026-05-27T04:55:11+00:00 2026-05-27T04:55:11+00:00

I try to extract all five rows listed in the table above. I’m using

0

enter image description here

I try to extract all five rows listed in the table above.

I’m using Ruby hpricot library to extract the table rows using xpath expression.

In my example, the xpath expression I use is /html/body/center/table/tr. Note that I’ve removed the tbody tag from the expression, which is usually the case for successful extraction.

The weird thing is that I’m getting the first three rows in the result with the last two rows missing. I just have no idea what’s going on there.

EDIT: Nothing magic about the code, just attaching it upon request.

require 'open-uri'
require 'hpricot'

faculty = Hpricot(open("http://www.utm.utoronto.ca/7800.0.html"))
(faculty/"/html/body/center/table/tr").each do |text|
  puts text.to_s
end

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T04:55:12+00:00

The HTML document in question is invalid. (See http://validator.w3.org/check?uri=http%3A%2F%2Fwww.utm.utoronto.ca%2F7800.0.html.) Hpricot parses it in another way than your browser — hence the different results — but it can’t really be blamed. Until HTML5, there was no standard on how to parse invalid HTML documents.

I tried replacing Hpricot with Nokogiri and it seems to give the expected parse. Code:

require 'open-uri'
require 'nokogiri'

faculty = Nokogiri.HTML(open("http://www.utm.utoronto.ca/7800.0.html"))

faculty.search("/html/body/center/table/tr").each do |text|
  puts text
end

Maybe you should switch?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I try to extract all five rows listed in the table above. I’m using

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply