Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7080459
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T06:47:53+00:00 2026-05-28T06:47:53+00:00

I have the following code, thanks to another SO question/answer: page = agent.page.search(table tbody

  • 0

I have the following code, thanks to another SO question/answer:

page = agent.page.search("table tbody tr").each do |row|
  time        = row.css("td:nth-child(1)").text.strip
  source      = row.css("td:nth-child(2)").text.strip
  destination = row.css("td:nth-child(3)").text.strip
  duration    = row.css("td:nth-child(4)").text.strip
  Call.create!(:time => time, :source => source, :destination => destination, :duration => duration)
end

It’s working well and when I run the rake task it correctly puts the data into the correct table row in my Rails application, however, for some reason after successfully creating a record for a row it’s also creating a blank record.

I can’t figure it out. From the looks of the code it’s issuing the create! command within each row.

You can see the full rake task at https://gist.github.com/1574942 and
the other question leading to this code is “Parse html into Rails without new record every time?“.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T06:47:54+00:00Added an answer on May 28, 2026 at 6:47 am

    Based on the comment:

    I think you could be right, I have looked at the HTML at the remote webpage and they are adding a wrapping around every table row which is assigned a class. I wonder if there is any way of getting the script to skip empty rows?

    If you’re seeing an HTML structure like:

    <table>
      <tbody>
        <tr>
          <tr>
            <td>time</td>
            <td>source</td>
            <td>destination</td>
            <td>duration</td>
          </tr>
        </tr>
      </tbody>
    </table>
    

    Then this will show the problem:

    require 'nokogiri'
    require 'pp'
    
    html = '<table><tbody><tr><tr><td>time</td><td>source</td><td>destination</td><td>duration</td></tr></tr></tbody></table>'
    doc = Nokogiri::HTML(html)
    page = doc.search("table tbody tr").each do |row|
      time        = row.css("td:nth-child(1)").text.strip
      source      = row.css("td:nth-child(2)").text.strip
      destination = row.css("td:nth-child(3)").text.strip
      duration    = row.css("td:nth-child(4)").text.strip
      hash = {
        :time        => time,
        :source      => source,
        :destination => destination,
        :duration    => duration 
      }
      pp hash
    end
    

    That outputs:

    {:time=>"", :source=>"", :destination=>"", :duration=>""}
    {:time=>"time",
     :source=>"source",
     :destination=>"destination",
     :duration=>"duration"}
    

    The reason you are getting the blank rows is because the HTML is malformed. The outside <tr> shouldn’t be there. The fix is easy and will work with HTML that is correct also.

    Also, the inner css access is not quite correct, but why that is so is subtle. I’ll get to that.

    To fix the first, we’ll add a conditional test:

    page = doc.search("table tbody tr").each do |row|
    

    becomes:

    page = doc.search("table tbody tr").each do |row|
      next if (!row.at('td'))
    

    After running, the output is now:

    {:time=>"time",
     :source=>"source",
     :destination=>"destination",
     :duration=>"duration"}
    

    That’s really all you need to fix the problem, but there are some things in the code that are doing things the hard way which requires some ‘splainin’, but first here’s the code change:

    From:

    time        = row.css("td:nth-child(1)").text.strip
    source      = row.css("td:nth-child(2)").text.strip
    destination = row.css("td:nth-child(3)").text.strip
    duration    = row.css("td:nth-child(4)").text.strip
    

    Change to:

    time, source, destination, duration = row.search('td').map{ |td| td.text.strip }
    

    Running that code outputs what you want:

    {:time=>"time",
     :source=>"source",
     :destination=>"destination",
     :duration=>"duration"}
    

    so things are hunky-dory still.

    Here’s the problem with your original code:

    css is an alias to search. Nokogiri returns a NodeSet for both. text will return an empty string from an empty NodeSet, which you’d get for each of the row.css("td:nth-child(...)").text.strip calls that looked at the outer <tr>. So, Nokogiri was failing to do what you wanted silently, because it was syntactically correct and logically correct given what you told it to do; It just failed to meet your expectations.

    Using at, or one of its aliases, like css_at, looks for the first matching accessor. So, theoretically we could continue to use row.at("td:nth-child(1)").text.strip with multiple assignments for each accessor, and that would have immediately revealed you had a problem with the HTML because the text would have blown up. But that’s not zen-like enough.

    Instead, we can iterate over the cells returned in the NodeSet using map and let it gather the needed cell contents and strip them, then do a parallel assignment to the variables:

    time, source, destination, duration = row.search('td').map{ |td| td.text.strip }
    

    Again, running this:

    require 'nokogiri'
    require 'pp'
    
    html = '<table><tbody><tr><tr><td>time</td><td>source</td><td>destination</td><td>duration</td></tr></tr></tbody></table>'
    doc = Nokogiri::HTML(html)
    page = doc.search("table tbody tr").each do |row|
      next if (!row.at('td'))
    
      time, source, destination, duration = row.search('td').map{ |td| td.text.strip }
    
      hash = {
        :time        => time,
        :source      => source,
        :destination => destination,
        :duration    => duration 
      }
      pp hash
    end
    

    Gives me:

    {:time=>"time",
     :source=>"source",
     :destination=>"destination",
     :duration=>"duration"}
    

    Retrofit that into your code and you get:

    page = agent.page.search("table tbody tr").each do |row|
      next if (!row.at('td'))
      time, source, destination, duration = row.search('td').map{ |td| td.text.strip }
      Call.create!(:time => time, :source => source, :destination => destination, :duration => duration)
    end
    

    And you probably don’t need the page =.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

i have following code to show div on page <div id=all_users> <div id=user11 userid=11
I have the following php code: <?php session_start(); .... $result=$db->query($query); $row=$result->fetch_assoc(); $_SESSION['id']=$row['id']; header('Location: http://www.blabla.com/successLoginPage.php');
Following on from another question I have asked I have created a new class
I have code listed here: Threading and Sockets . The answer to that question
EDIT: Optimization results at end of this question! hi, i have a following code
Thanks to the help in a previous question , I've got the following code
Quick question regarding EventHandlers in C#, let's say we have the following code: MyObject.MyEventHandler
I Have following code: Controller: public ActionResult Step1() { return View(); } [AcceptVerbs(HttpVerbs.Post)] public
I have following Code Block Which I tried to optimize in the Optimized section
I have following code in my application: // to set tip - photo in

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.