Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8596053
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 12, 20262026-06-12T00:35:10+00:00 2026-06-12T00:35:10+00:00

Because I hate clicking forth and back reading through Wikipedia articles I am trying

  • 0

Because I hate clicking forth and back reading through Wikipedia articles I am trying to build a tool to create “expanded Wikipedia articles” according to the following algorithm:

  • Create two variables: Depth and Length.
  • Set a Wikipedia article as a seed page
  • Parse through this article: Whenever there is a link to another article fetch the first Length sentences and include it in the original article (e.g. in brackets or otherwise highlighted).
  • Do this recursively up to a certain Depth, i.e. not deeper than two levels.

The result would be an article that could be read in one go without always clicking to and fro…

How would you build such a mechanism in Python? Which libraries should be used (are there any for such tasks)? Are there any helpful tutorials?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-12T00:35:11+00:00Added an answer on June 12, 2026 at 12:35 am

    You can use urllib2 for requesting the url. For parsing the htmlpage there is wonderful library for you called BeautifulSoup. One thing you need to consider is that while scanning Wikipedia with your crawler you need to add a header alongwith your request. Or else Wikipedia will simply dissallow to be crawled.

     request = urllib2.Request(page)
    

    adding header

     request.add_header('User-agent', 'Mozilla/5.0 (Linux i686)')
    

    and then load the page and give it to BeautifulSoup.

     soup = BeautifulSoup(response)  
     text = soup.get_text()
    

    this will give you the links in a page

     for url in soup.find_all('a',attrs={'href': re.compile("^http://")}):  
           link = url['href']
    

    And now regarding the algorithm for crawling Wikipedia what you want is something called Depth Limited Search. A pseudocode is provided in the same page which is easy to follow.

    And other functionality of the said libraries can be googled and are easy to follow. Good luck.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm not that good at debugging IE, because... well because I hate it and
I hate Wordpress for numerous reasons, but clients love it because it gives them
I really hate this error, because it can be so hard to pin point.
I wanted to write a very simple greasemonkey script because I hate the are
I love/hate the regex because of usefulness/hardness. (I don't why but I can't construct
I use singletons a lot because I hate to pass an object of the
Eclipse is a you-love-it or you-hate-it tool for editing Javascript files. For me, it's
I really HATE to ask this because I thought this is supposed to be
Ok, so I hate asking RegEx questions because I like to figure them out
I don't normally work with IE7 simply because I hate it, but my latest

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.