Using urllibs (or urllibs2 ) and wanting what I want is hopeless. Any solution?

Question

0

Editorial Team

Asked: May 21, 20262026-05-21T12:40:51+00:00 2026-05-21T12:40:51+00:00

Using urllibs (or urllibs2 ) and wanting what I want is hopeless. Any solution?

0

Using urllibs (or urllibs2) and wanting what I want is hopeless.
Any solution?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-21T12:40:52+00:00

I’m not sure how the C# implementation works, but, as internet streams are generally not seekable, my guess would be it downloads all the data to a local file or in-memory object and seeks within it from there. The Python equivalent of this would be to do as Abafei suggested and write the data to a file or StringIO and seek from there.

However, if, as your comment on Abafei’s answer suggests, you want to retrieve only a particular part of the file (rather than seeking backwards and forwards through the returned data), there is another possibility. urllib2 can be used to retrieve a certain section (or ‘range’ in HTTP parlance) of a webpage, provided that the server supports this behaviour.

The `range` header

When you send a request to a server, the parameters of the request are given in various headers. One of these is the Range header, defined in section 14.35 of RFC2616 (the specification defining HTTP/1.1). This header allows you to do things such as retrieve all data starting from the 10,000th byte, or the data between bytes 1,000 and 1,500.

Server support

There is no requirement for a server to support range retrieval. Some servers will return the Accept-Ranges header (section 14.5 of RFC2616) along with a response to report if they support ranges or not. This could be checked using a HEAD request. However, there is no particular need to do this; if a server does not support ranges, it will return the entire page and we can then extract the desired portion of data in Python as before.

Checking if a range is returned

If a server returns a range, it must send the Content-Range header (section 14.16 of RFC2616) along with the response. If this is present in the headers of the response, we know a range was returned; if it is not present, the entire page was returned.

Implementation with urllib2

urllib2 allows us to add headers to a request, thus allowing us to ask the server for a range rather than the entire page. The following script takes a URL, a start position, and (optionally) a length on the command line, and tries to retrieve the given section of the page.

import sys
import urllib2

# Check command line arguments.
if len(sys.argv) < 3:
    sys.stderr.write("Usage: %s url start [length]\n" % sys.argv[0])
    sys.exit(1)

# Create a request for the given URL.
request = urllib2.Request(sys.argv[1])

# Add the header to specify the range to download.
if len(sys.argv) > 3:
    start, length = map(int, sys.argv[2:])
    request.add_header("range", "bytes=%d-%d" % (start, start + length - 1))
else:
    request.add_header("range", "bytes=%s-" % sys.argv[2])

# Try to get the response. This will raise a urllib2.URLError if there is a
# problem (e.g., invalid URL).
response = urllib2.urlopen(request)

# If a content-range header is present, partial retrieval worked.
if "content-range" in response.headers:
    print "Partial retrieval successful."

    # The header contains the string 'bytes', followed by a space, then the
    # range in the format 'start-end', followed by a slash and then the total
    # size of the page (or an asterix if the total size is unknown). Lets get
    # the range and total size from this.
    range, total = response.headers['content-range'].split(' ')[-1].split('/')

    # Print a message giving the range information.
    if total == '*':
        print "Bytes %s of an unknown total were retrieved." % range
    else:
        print "Bytes %s of a total of %s were retrieved." % (range, total)

# No header, so partial retrieval was unsuccessful.
else:
    print "Unable to use partial retrieval."

# And for good measure, lets check how much data we downloaded.
data = response.read()
print "Retrieved data size: %d bytes" % len(data)

Using this, I can retrieve the final 2,000 bytes of the Python homepage:

blair@blair-eeepc:~$ python retrieverange.py http://www.python.org/ 17387
Partial retrieval successful.
Bytes 17387-19386 of a total of 19387 were retrieved.
Retrieved data size: 2000 bytes

Or 400 bytes from the middle of the homepage:

blair@blair-eeepc:~$ python retrieverange.py http://www.python.org/ 6000 400
Partial retrieval successful.
Bytes 6000-6399 of a total of 19387 were retrieved.
Retrieved data size: 400 bytes

However, the Google homepage does not support ranges:

blair@blair-eeepc:~$ python retrieverange.py http://www.google.com/ 1000 500
Unable to use partial retrieval.
Retrieved data size: 9621 bytes

In this case, it would be necessary to extract the data of interest in Python prior to any further processing.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Using urllibs (or urllibs2 ) and wanting what I want is hopeless. Any solution?

Leave an answerCancel reply

1 Answer

The range header

Server support

Checking if a range is returned

Implementation with urllib2

Leave an answer
Cancel reply

The `range` header