Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8911847
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 15, 20262026-06-15T04:02:51+00:00 2026-06-15T04:02:51+00:00

I have an iterator which is supposed to run for several days. I want

  • 0

I have an iterator which is supposed to run for several days. I want errors to be caught and reported, and then I want the iterator to continue. Or the whole process can start over.

Here’s the function:

def get_units(self, scraper):
    units = scraper.get_units()
    i = 0
    while True:
        try:
            unit = units.next()
        except StopIteration:
            if i == 0:
                log.error("Scraper returned 0 units", {'scraper': scraper})
            break
        except:
            traceback.print_exc()
            log.warning("Exception occurred in get_units", extra={'scraper': scraper, 'iteration': i})
        else:
            yield unit
        i += 1

Because scraper could be one of many variants of code, it can’t be trusted and I don’t want to handle the errors there.

But when an error occurs in units.next(), the whole thing stops. I suspect because an iterator throws a StopIteration when one of it’s iterations fails.

Here’s the output (only the last lines)

[2012-11-29 14:11:12 /home/amcat/amcat/scraping/scraper.py:135 DEBUG] Scraping unit <Element div at 0x4258c710>
[2012-11-29 14:11:13 /home/amcat/amcat/scraping/scraper.py:138 DEBUG] .. yields article
[2012-11-29 14:11:13 /home/amcat/amcat/scraping/scraper.py:138 DEBUG] .. yields article
[2012-11-29 14:11:13 /home/amcat/amcat/scraping/scraper.py:138 DEBUG] .. yields article
[2012-11-29 14:11:13 /home/amcat/amcat/scraping/scraper.py:138 DEBUG] .. yields article
[2012-11-29 14:11:13 /home/amcat/amcat/scraping/scraper.py:138 DEBUG] .. yields article
[2012-11-29 14:11:13 /home/amcat/amcat/scraping/scraper.py:138 DEBUG] .. yields article
[2012-11-29 14:11:13 /home/amcat/amcat/scraping/scraper.py:138 DEBUG] .. yields article
[2012-11-29 14:11:13 /home/amcat/amcat/scraping/scraper.py:138 DEBUG] .. yields article
[2012-11-29 14:11:13 /home/amcat/amcat/scraping/scraper.py:138 DEBUG] .. yields article
[2012-11-29 14:11:13 /home/amcat/amcat/scraping/scraper.py:138 DEBUG] .. yields article
[2012-11-29 14:11:13 /home/amcat/amcat/scraping/scraper.py:138 DEBUG] .. yields article Counter-Strike: Global Offensive Update Released
Traceback (most recent call last):
  File "/home/amcat/amcat/scraping/controller.py", line 101, in get_units
    unit = units.next()
  File "/home/amcat/amcat/scraping/scraper.py", line 114, in get_units
    for unit in self._get_units():
  File "/home/amcat/scraping/games/steamcommunity.py", line 90, in _get_units
    app_doc = self.getdoc(url,urlencode(form))
  File "/home/amcat/amcat/scraping/scraper.py", line 231, in getdoc
    return self.opener.getdoc(url, encoding)
  File "/home/amcat/amcat/scraping/htmltools.py", line 54, in getdoc
    response = self.opener.open(url, encoding)
  File "/usr/lib/python2.7/urllib2.py", line 406, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 444, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 500: Internal Server Error
[2012-11-29 14:11:14 /home/amcat/amcat/scraping/controller.py:110 WARNING] Exception occurred in get_units

...code ends...

So how can I prevent the iterating to stop when an error occurs?

EDIT: here’s the code within get_units()

def get_units(self):
    """                                                                                                                                                                                                                                  
    Split the scraping job into a number of 'units' that can be processed independently                                                                                                                                                  
    of each other.                                                                                                                                                                                                                       

    @return: a sequence of arbitrary objects to be passed to scrape_unit                                                                                                                                                                 
    """
    self._initialize()
    for unit in self._get_units():
        yield unit

And here’s a simplified _get_units():

INDEX_URL = "http://www.steamcommunity.com"

def _get_units(self):
  doc = self.getdoc(INDEX_URL)  #returns a lxml.etree document

  for a in doc.cssselect("div.discussion a"):
    link = a.get('href')
    yield link

EDIT: question followup: Alter each for-loop in a function to have error handling executed automatically after each failed iteration

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-15T04:02:52+00:00Added an answer on June 15, 2026 at 4:02 am

    StopIteration is raised by the next() method of a generator when there is no next item anymore. It has nothing to do with errors inside the generator/iterator.

    Another thing to note is that, depending on the type of your iterator, it might not be able to resume after an exception. If the iterator is an object with a next method, it will work. However, if it’s actually a generator, it won’t.

    As far as I can tell, this is the only reason why your iteration doesn’t continue after an error from units.next(). I.e. units.next() fails, and the next time you call it, it’s not able to resume and it says it’s done by throwing a StopIteration exception.

    Basically you’d have to show us the code inside scraper.get_units() for us to understand why the loop is not able to continue after an error inside a single iteration. If get_units() is implemented as a generator function, it’s clear. If not, it might be something else that’s preventing it from resuming.

    UPDATE: explaining what a generator function is:

    class Scraper(object):
        def get_units(self):
            for i in some_stuff:
                bla = do_some_processing()
                bla *= 2  # random stuff
                yield bla
    

    Now, when you call Scraper().get_units(), instead of running the entire function, it returns a generator object. Calling next() on it, will take the execution to the first yield. Etc. Now if an error occurs ANYWHERE inside get_units, it will be tainted, so to say, and the next time you call next(), it will raise StopIteration, just as if it had run out of items to give you.

    Reading of http://www.dabeaz.com/generators/ (and http://www.dabeaz.com/coroutines/) strongly recommended.

    UPDATE2: A possible solution https://gist.github.com/4175802

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Suppose I have a function which can either take an iterable/iterator or a non-iterable
I have a large set of data which I access via a generator/iterator. While
I have a custom container class for which I'd like to write the iterator
Is there a simple or standard way to have a multimap iterator which iterate
I have an iterator which contains the following functions: ... T &operator*() { return
I have a Abstract Iterator class which has this function void iterate(){ while(this.hasnext()){ ..this.next()..
I need an input file stream which would have a bidirectional iterator/adapter. Unfortunately std::ifstream
I have a iterator which goes through a list and create their name and
I have an implementation of java.util.Iterator which requires that the call to next() should
I have an iterator which iterates over a std::shared_ptr. So operator++ points the internally

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.