I am trying to write parser (parse postal code’s streets and houses) with eventmachine

Question

0

Asked: June 6, 20262026-06-06T00:20:40+00:00 2026-06-06T00:20:40+00:00

I am trying to write parser (parse postal code’s streets and houses) with eventmachine

0

I am trying to write parser (parse postal code’s streets and houses) with eventmachine and em-synchrony. The thing is that the website I want to parse has nested structure – for each postal code there are many pages of streets, which has pagination. So the algorithm is pretty simple:

for each postal code
- visit postcal code index page
  - parse index page
  - parse pagination
  - for each pagination page parse this page

Here is an example of such a parser (it works):

require "nokogiri"
require "em-synchrony"
require "em-synchrony/em-http"

def url page = nil
  url = "http://gistflow.com/all"
  url << "?page=#{page}" if page
  url
end

EM.synchrony do
  concurrency = 2

  # here [1] is array of index pages, for this template let it be just [1]
  results = EM::Synchrony::Iterator.new([1], concurrency).map do |index, iter|
    index_page = EM::HttpRequest.new(url).aget

    index_page.callback do
      # here we make some parsing and find out wheter index page 
      # has pagination. The worst case is that it has pagination
      pages = [2,3,4,5]

      unless pages.empty?
        # here we need to parse all pages
        # with urls like url(page)
        # how can I do it more efficiently?
      end

      iter.return "SUCC #{index}"
    end

    index_page.errback do 
      iter.return "ERR #{index}"
    end
  end

  p results
  EM.stop
end

So the trick is inside this block:

unless pages.empty?
  # here we need to parse all pages
  # with urls like url(page)
  # how can I do it more efficiently?
end

How can I implement nested EM HTTP calls inside synchrony iterator loop?

I was trying different approaches but each time I got errors like “couldn’t yield from root fiber” or errback block was called.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-06T00:20:42+00:00

One solution is to use FiberIterator and the synchronous .get instead of .aget:

require "em-synchrony"
require "em-synchrony/em-http"
require "em-synchrony/fiber_iterator"

def url page = nil
  url = "http://gistflow.com/all"
  url << "?page=#{page}" if page
  url
end

EM.synchrony do
  concurrency = 2

  master_pages = [1,2,3,4]

  EM::Synchrony::FiberIterator.new(master_pages, concurrency).each do |iter|
    result = EM::HttpRequest.new(url).get
    if result
      puts "SUCC #{iter}"
      detail_pages = [1,2,3,4]       
      EM::Synchrony::FiberIterator.new(detail_pages, concurrency).each do |iter2|
        result2 = EM::HttpRequest.new(url).get
        puts "SUCC/ERR #{iter} > #{iter2}"
      end
    else
      puts "ERR #{iter}"
    end
  end

  EM.stop

end

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to write parser (parse postal code’s streets and houses) with eventmachine

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply