I just started with Ruby On Rails, and want to create a simple web

Question

0

Editorial Team

Asked: June 12, 20262026-06-12T17:37:49+00:00 2026-06-12T17:37:49+00:00

I just started with Ruby On Rails, and want to create a simple web

0

I just started with Ruby On Rails, and want to create a simple web site crawler which:

Goes through all the Sherdog fighters’ profiles.
Gets the Referees’ names.
Compares names with the old ones (both during the site parsing and from the file).
Prints and saves all the unique names to the file.

An example URL is: http://www.sherdog.com/fighter/Fedor-Emelianenko-1500

I am searching for the tag entries like <span class="sub_line">Dan Miragliotta</span>, unfortunately, additionally to the proper Referee names I need, the same kind of class is used with:

The date.
“N/A” when the referee name is not known.

I need to discard all the results with a “N/A” string as well as any string which contains numbers. I managed to do the first part but couldn’t figure out how to do the second. I tried searching, thinking and experimenting, but, after experimenting and rewriting, managed to break the whole program and don’t know how to (properly) fix it:

require 'rubygems'
require 'hpricot'
require 'simplecrawler'

# Set up a new crawler
sc = SimpleCrawler::Crawler.new("http://www.sherdog.com/fighter/Fedor-Emelianenko-1500")
sc.maxcount = 1
sc.include_patterns = [".*/fighter/.*$", ".*/events/.*$", ".*/organizations/.*$", ".*/stats/fightfinder\?association/.*$"]

# The crawler yields a Document object for each visited page.
sc.crawl { |document|
# Parse page title with Hpricot and print it
hdoc = Hpricot(document.data)

(hdoc/"td/span[@class='sub_line']").each do |span|
  if span.inner_html == 'N/A' || Regexp.new(".*/\d\.*$").match(span.inner_html)
    # puts "Test"
  else
    puts span.inner_html
    #File.open("File_name.txt", 'a') {|f| f.puts(hdoc.span.inner_html) } 
  end
end
}

I would also appreciate help with ideas on the rest of the program: How do I properly read the current names from the file, if the program is run multiple times, and how do I make the comparisons for the unique names?

Edit:

After some proposed improvements, here is what I got:

require 'rubygems'
require 'simplecrawler'
require 'nokogiri'
#require 'open-uri'

sc = SimpleCrawler::Crawler.new("http://www.sherdog.com/fighter/Fedor-Emelianenko-1500")
sc.maxcount = 1

sc.crawl { |document|
doc = Nokogiri::HTML(document.data)
names = doc.css('td:nth-child(4) .sub-line').map(&:content).uniq.reject { |c| c == 'N/A' }
puts names
}

Unfortunately, the code still doesn’t work – it returns a blank.

If instead of doc = Nokogiri::HTML(document.data), I write doc = Nokogiri::HTML(open(document.data)), then it gives me the whole page, but, parsing still doesn’t work.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T17:37:50+00:00

Editorial Team

2026-06-12T17:37:50+00:00Added an answer on June 12, 2026 at 5:37 pm

You would use array math (-) to compare them:

get referees from the current page

current_referees = doc.search('td[4] .sub_line').map(&:inner_text).uniq - ['N/A']

read old referees from the file

old_referees = File.read('old_referees.txt').split("\n")

use Array#- to compare them

new_referees = current_referees - old_referees

write the new file

File.open('new_referees.txt','w'){|f| f << new_referees * "\n"}

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I just started with Ruby On Rails, and want to create a simple web

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply