Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8276339
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 8, 20262026-06-08T08:20:40+00:00 2026-06-08T08:20:40+00:00

I am trying to extract data from Civic Commons Apps link for my project.

  • 0

I am trying to extract data from Civic Commons Apps link for my project. I am able to obtain the links of the page that I need. But when I try to open the links I get "urlopen error [Errno -2] Name or service not known"

The web scraping python code:

from bs4 import BeautifulSoup
from urlparse import urlparse, parse_qs
import re
import urllib2
import pdb

base_url = "http://civiccommons.org"
url = "http://civiccommons.org/apps"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())

list_of_links = [] 

for link_tag in soup.findAll('a', href=re.compile('^/civic-function.*')):
   string_temp_link = base_url+link_tag.get('href')
   list_of_links.append(string_temp_link)

list_of_links = list(set(list_of_links)) 

list_of_next_pages = []
for categorized_apps_url in list_of_links:
   categorized_apps_page = urllib2.urlopen(categorized_apps_url)
   categorized_apps_soup = BeautifulSoup(categorized_apps_page.read())

   last_page_tag = categorized_apps_soup.find('a', title="Go to last page")
   if last_page_tag:
      last_page_url = base_url+last_page_tag.get('href')
      index_value = last_page_url.find("page=") + 5
      base_url_for_next_page = last_page_url[:index_value]
      for pageno in xrange(0, int(parse_qs(urlparse(last_page_url).query)['page'][0]) + 1):
         list_of_next_pages.append(base_url_for_next_page+str(pageno))
      
   else:
      list_of_next_pages.append(categorized_apps_url)

I get the following error:

urllib2.urlopen(categorized_apps_url)
  File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 400, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 418, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1207, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1177, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno -2] Name or service not known>

Should I take care of anything specific when I perform urlopen? Because I don’t see a problem with the http links that I get.

[edit]
On second run I got the following error:

 File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 400, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 418, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1207, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1177, in do_open
    raise URLError(err)

The same code runs fine in my friend’s Mac, but fails in my ubuntu 12.04.

Also I tried running the code in scraper wiki and it finished successfully. But few url’s were missing (when compared to mac). Are there any reason for these behavior?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-08T08:20:42+00:00Added an answer on June 8, 2026 at 8:20 am

    The code works on my Mac and on your friends mac. It runs fine from a virtual machine instance of Ubuntu 12.04 server. There is obviously something in your particular environment – your os (Ubuntu Desktop?) or network that is causing it to crap out. For example my home router’s default setting throttles the number of calls to the same domain in x seconds – and could cause this kind of issue if I didn’t turn it off. It could be a number of things.

    At this stage I would suggest refactoring your code to catch the URLError and set aside problematic urls for a retry. Also log/print errors if they fail after several retries. Maybe even throw in some code to time your calls between errors. It is better than having your script just fail outright and you’ll get feedback as to whether it is just particular urls causing the problem or a timing issue (i.e. does it fail after x number of urlopen calls, or if it is failing after x number of urlopen calls in x amount of micro/seconds). If it’s a timing issue, a simple time.sleep(1) inserted into your loops might do the trick.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to import a csv (that is a data extract from a SQL
I've got a xml that I'm parsing and trying to extract some data from.
I am trying to extract the source data from a PivotTable that uses a
I'm trying to extract data from the following page: http://www.bmreports.com/servlet/com.logica.neta.bwp_PanBMDataServlet?param1=&param2=&param3=&param4=&param5=2009-04-22&param6=37# Which, conveniently and inefficiently
I am trying to extract data from xml file that looks like this (see
I'm trying to extract data from a data file that's tab-delimited (in some parts),
I have been trying to extract data from a database and fill in a
I'm trying to figure out how to extract some data from a string according
Folks, I'm tryning to extract data from web page using C#.. for the moment
I am trying to extract data that corresponds to a stock that is present

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.