Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6808431
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T19:57:16+00:00 2026-05-26T19:57:16+00:00

So this is what I have: require ‘rubygems’ require ‘nokogiri’ require ‘open-uri’ root_url =

  • 0

So this is what I have:

require 'rubygems'
require 'nokogiri'
require 'open-uri'

root_url = "http://boxerbiography.blogspot.com/2006/11/table-of-contents.html"
file_path = "boxer-noko.html"

site = Nokogiri::HTML(open(root_url))

titles = []
content = []

site.css(".entry a").each do |link|
    titles.push(link)

    content_url = link[:href]
    content_page = Nokogiri::HTML(open(content_url))

    content_page.css("#top p").each do |copy|
        content.push(copy)
    end

end

But what this does is n^n loops. i.e. if there are 5 links on the main page, it goes to the first one, then in content it assigns it the value of all the 5 links (with the current one at the top), then it goes back out and goes to the next one and keeps doing it.

So each piece of content is actually returning the content for every single link, which looks like this:

Link 1

Copy associated with Link 1.
Copy associated with Link 2.
Copy associated with Link 3.
.
.
.

Link 2

Copy associated with Link 2.
Copy associated with Link 3.
Copy associated with Link 4.
Copy associated with Link 5.
Copy associated with Link 1.
.
.
.

etc.

What I would like it to do is return this:

Link 1

Copy associated with Link 1.

Link 2

Copy associated with Link 2.

In as efficient a way as possible.

How do I do that?

Edit1: I guess an easy way to think about this is that in each array, say titles, I would like to store both the link and the content associated with that link. But not quite sure how to do that, given that I have to open two URI connections to parse both pages and keep going back to the root.

So I imagined it like:

title[0] = :href => "http://somelink.com", :content => "Copy associated with some link".

But can’t quite get it there, so I am forced to do it using two arrays which seems suboptimal to me.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T19:57:17+00:00Added an answer on May 26, 2026 at 7:57 pm

    The following will create a hash with URL keys, each URL’s value is the collection of Nokogiri paragraph elements.

    require 'rubygems'
    require 'nokogiri'
    require 'open-uri'
    
    root_url = "http://boxerbiography.blogspot.com/2006/11/table-of-contents.html"
    
    site = Nokogiri::HTML(open(root_url))
    
    contents = {}
    site.css(".entry a").each do |link|
        content_url = link[:href]
        p "Fetching #{content_url}..."
        content_page = Nokogiri::HTML(open(content_url))
        contents[link[:href]] = content_page.css("#top p")
    end
    

    As a sanity check, you can check the contents of one of the keys like this:

    contents[contents.keys.first]
    

    This may or may not be what you actually want, since it’ll keep all the inner tags in place (<br/>s, <i>...</i>s, etc.) but that can be tweaked pretty easily by changing how the contents are gathered. Or it can just be handled through post-processing each URL’s contents.

    If you want to keep more information about each URL (like the link’s text) then you’d probably want to create a tiny wrapper class with url and title attributes.

    As it stands, the code doesn’t do any checking to make sure each URL is only retrieved once–it might be better to create a Set of URLs to force uniqueness, then create the map by iterating over that set’s contents (URLs).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have this NodeJS snippnet : require('http').get({ secure: true, host: 'github.com', method: 'GET', path:
I have this: require 'tempfile' t = Tempfile.new('test-data') t.open t.sync = true t <<
I have this code: require 'rubygems' require 'activeresource' ActiveResource::Base.logger = Logger.new(#{File.dirname(__FILE__)}/exercises.log) class Exercise <
If I have this page in http://example.com/login with GET verb: <form action=https://example.com/login method=post> <input
i have this code: require 'java' require 'iText-5.0.6.jar' module Pdf include_package com.itextpdf.text.pdf include_package java.io
Hallo I have this script: <? require(lib2/config.inc.php); require(lib2/tpl.class.php); require(lib2/db.class.php); require(lib2/um.class.php); $tpl = new template(templates,
I hope I can explain this right I have two input fields that require
I have this code: require 'facebook.php'; $facebook = new Facebook(array( 'appId' => 'xxxxxxx', 'secret'
I have this code: require(class.XMLHttpRequest.php); function hot($news){ $url=https://localhost/search.aspx?search=.$news.; $ajax=new XMLHttpRequest(); $ajax->setRequestHeader(Cookie,Cookie: host); $ajax->open(GET,$url,true); $ajax->send(null);
Perhaps this is nitpicky, but I have to ask. I'm using Nokogiri to parse

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.