I’m parsing a Reddit RSS feed with Nokogiri for a certain subreddit. I’m trying

Question

0

Editorial Team

Asked: June 13, 20262026-06-13T17:47:46+00:00 2026-06-13T17:47:46+00:00

I’m parsing a Reddit RSS feed with Nokogiri for a certain subreddit. I’m trying

0

I’m parsing a Reddit RSS feed with Nokogiri for a certain subreddit.

I’m trying to capture the external URL of the post if it goes to a certain domain.

Unfortunately, even if the post created by the user links to an external website, all of the RSS titles go to that reddit post (comment area) regardless. There is one attribute called description however, generated by the Reddit RSS feed, which DOES include an HTML string that includes two links:

[link][2 comments]

It is always the second to last anchor in the description.text

With Nokogiri, I can get down to the part where I pull the entire description into a string, and then I instantiate a new Nokogiri::HTML object with this string.

I’m wondering two things:

Is there a way to convert a string to Nokogiri::HTML so I dont need to create a new object?
How do I save the href value for the second to last link which appears in the description?

Code:

def scrape
  @document = Nokogiri::XML(open(self.url))
  @document.xpath("//item").each do |item|
    description_html = item.xpath('description').text
    url = Nokogiri::HTML(description_html)... #?
  end
end

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T17:47:47+00:00

Editorial Team

2026-06-13T17:47:47+00:00Added an answer on June 13, 2026 at 5:47 pm

Figured it out

def scrape
  document = Nokogiri::XML(open(self.url))
  document.xpath('//item').each do |item|
    description_html = item.xpath('description').text
    url = Nokogiri::HTML(description_html).xpath('//a')[-2]['href']
  end
end

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m parsing a Reddit RSS feed with Nokogiri for a certain subreddit. I’m trying

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply