I’m trying to write a script in Ruby to parse a Wikipedia article using

Question

0

Asked: June 4, 20262026-06-04T02:42:23+00:00 2026-06-04T02:42:23+00:00

I’m trying to write a script in Ruby to parse a Wikipedia article using

0

I’m trying to write a script in Ruby to parse a Wikipedia article using Nokogiri and CSS selectors. I’m a little confused about conditionals within the script though. Here’s what I have so far (page is the downloaded html using Nokogiri):

page.css('h3').each do |node|
  puts node.text
end

page.css('li').each do |node|   
  if /\d|\D/.match(node)
    puts node.text.scan(/[\d]+\D*/).first
  end
end

page.css('td b').each do |node|
  puts node.text
end

This all works fine. However, what I really want is something like this:

page.css('h3, li, td b').each do |node|
  # if it's an h3 node, do one thing
  # if it's a li node, do another thing
  # else if it's a 'td b' node, do another thing
end

This would allow the page to be parsed in order, instead of going through the body three separate times. However, I’m not sure how to write those conditionals within my script.

EDIT:
So now my script is

page.css('h3, li, td b').each do |node|
        case node.name
        when 'h3', 'b'
            puts node.text
        when 'li'
            if /\d|\D/.match(node)
                puts node.text.scan(/[\d]+\D*/).first
            end
        else
            next
    end
end

However, it hasn’t changed the behavior. It processes them in the same order it did before (all the ‘h3’ elements, then all the ‘li’ elements, then all the ‘b’ elements).

EDIT 2:

Okay, I finally got it to work. Here was my final set of conditionals:

page.traverse do |node|
    case
            when 'h3' == node.name 
            puts node.text
        when 'li' == node.name 
            puts node.text.scan(/[\d]+\D*/).first if /\d|\D/.match(node)
        when 'b' == node.name
            puts node.text if (node.parent.name == 'p' or node.parent.name == 'td')
    end
end

Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-04T02:42:24+00:00

Editorial Team

2026-06-04T02:42:24+00:00Added an answer on June 4, 2026 at 2:42 am

You might be looking for traverse:

page.traverse do |node|
  case
    when ['h3', 'li'].include?(node.name) then puts node.text
    when 'b' == node.name && 'td' == node.parent.name then puts node.text[/\d+\D*/]
  end
end

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to write a script in Ruby to parse a Wikipedia article using

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply