Using urllibs (or urllibs2) and wanting what I want is hopeless.
Any solution?
Using urllibs (or urllibs2 ) and wanting what I want is hopeless. Any solution?
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
I’m not sure how the C# implementation works, but, as internet streams are generally not seekable, my guess would be it downloads all the data to a local file or in-memory object and seeks within it from there. The Python equivalent of this would be to do as Abafei suggested and write the data to a file or StringIO and seek from there.
However, if, as your comment on Abafei’s answer suggests, you want to retrieve only a particular part of the file (rather than seeking backwards and forwards through the returned data), there is another possibility.
urllib2can be used to retrieve a certain section (or ‘range’ in HTTP parlance) of a webpage, provided that the server supports this behaviour.The
rangeheaderWhen you send a request to a server, the parameters of the request are given in various headers. One of these is the
Rangeheader, defined in section 14.35 of RFC2616 (the specification defining HTTP/1.1). This header allows you to do things such as retrieve all data starting from the 10,000th byte, or the data between bytes 1,000 and 1,500.Server support
There is no requirement for a server to support range retrieval. Some servers will return the
Accept-Rangesheader (section 14.5 of RFC2616) along with a response to report if they support ranges or not. This could be checked using a HEAD request. However, there is no particular need to do this; if a server does not support ranges, it will return the entire page and we can then extract the desired portion of data in Python as before.Checking if a range is returned
If a server returns a range, it must send the
Content-Rangeheader (section 14.16 of RFC2616) along with the response. If this is present in the headers of the response, we know a range was returned; if it is not present, the entire page was returned.Implementation with urllib2
urllib2allows us to add headers to a request, thus allowing us to ask the server for a range rather than the entire page. The following script takes a URL, a start position, and (optionally) a length on the command line, and tries to retrieve the given section of the page.Using this, I can retrieve the final 2,000 bytes of the Python homepage:
Or 400 bytes from the middle of the homepage:
However, the Google homepage does not support ranges:
In this case, it would be necessary to extract the data of interest in Python prior to any further processing.