When I download a file from my server using Python and urlib2, the files contents are slightly different than what they should be.
They are different in that there are extra lines(“\r\n” or “\n”) added to the downloaded version of the file. This isn’t a major problem for a file that is in xml form but its a major problem when my file is encrypted because the file contents are slightly different and they cant be decrypted.
I know its my download code thats altering the file contents and not the file thats on the server being wrong because I have downloaded that file using FTP and found it has the correct content when I download it that way. Some more useful information: the file is ASCII encoded. My server is Windows .NET and I’m unsure if the http response is ascii or unicode – maybe that could be causing the problem?
How can I make urlib2 download my file from my server and ensure that the content doesn’t change?
Original content:
<clientlist>
<client>
<clientauthblah>blah</clientauthblah>
<version9>blah</version9>
<version10>blah</version10>
<companyno>1</companyno>
<companyname>blah</companyname>
</client>
When I run my download code this is the content I get, this is something I cant have because my files are encrypted and it means I cant decrypt the file:
<clientlist>
<client>
<clientauthblah>blah</clientauthblah>
<version9>blah</version9>
<version10>blah</version10>
<companyno>1</companyno>
<companyname>blah</companyname>
</client>
Heres my code:
# Download
response = urllib2.urlopen("http://www.mywebsite.com/Clients.xml")
output = open("tempEncrypted.xml",'w')
res = response.read()
output.write(res)
output.close()
The problem here is this line:
Python opens files in text mode by default, which means you may get newline conversions. The details are complicated by platform differences, universal newlines, etc.
But if you have binary data, the answer is simple: open it in binary mode, by using
'wb'instead of'w':