Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 838931
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T05:23:06+00:00 2026-05-15T05:23:06+00:00

I was condering when I use urllib2.urlopen() does it just to header reads or

  • 0

I was condering when I use urllib2.urlopen() does it just to header reads or does it actually bring back the entire webpage?

IE does the HTML page actually get fetch on the urlopen call or the read() call?

handle = urllib2.urlopen(url)
html = handle.read()

The reason I ask is for this workflow…

  • I have a list of urls (some of them with short url services)
  • I only want to read the webpage if I haven’t seen that url before
  • I need to call urlopen() and use geturl() to get the final page that link goes to (after the 302 redirects) so I know if I’ve crawled it yet or not.
  • I don’t want to incur the overhead of having to grab the html if I’ve already parsed that page.

thanks!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T05:23:06+00:00Added an answer on May 15, 2026 at 5:23 am

    I just ran a test with wireshark. When I called urllib2.urlopen( ‘url-for-a-700mbyte-file’), only the headers and a few packets of body were retrieved immediately. It wasn’t until I called read() that the majority of the body came across the network. This matches what I see by reading the source code for the httplib module.

    So, to answer the original question, urlopen() does not fetch the whole body over the network. It fetches the headers and usually some of the body. The rest of the body is fetched when you call read().

    The partial body fetch is to be expected, because:

    1. Unless you read an http response one byte at a time, there is no way to know exactly how long the incoming headers will be and therefore no way to know how many bytes to read before the body starts.

    2. An http client has no control of how many bytes a server bundles into each tcp frame of a response.

    In practice, since some of the body is usually fetched along with the headers, you might find that small bodies (e.g. small html pages) are fetched entirely on the urlopen() call.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I was wondering how to use cin so that if the user does not
I am using BeautifulSoup and urllib2 for downloading HTML pages and parsing them. Problem
I'm considering to use url pattern like below: example.com/item/r6B0PmUmx07O/just-one-item example.com/item/r6B0PGgwPJWl/yet-another-item the part before slug
Im considering use CSLA.NET 3.8 for example for Security and Identity Management on a
I'm currently considering the use of Reflection classes (ReflectionClass and ReflectionMethod mainly) in my
I am considering the use of a tab control on a parent form for
I have quite a problem concerning the use of relational database concepts in Delphi
Studying compilers course, I am left wondering why use registers at all. It is
In considering languages to use in creating a web-application that interfaces with a database
I'm wondering how to use a VideoDisplay object (defined in MXML) to display video

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.