I have a little python script that I am using to download a whole bunch of PDF files for archiving. The problem I have is that when I download the files, they appear correctly under the correct title, but they are the wrong size and they can’t be opened by Acrobat, which fails with an error message saying Out of memory or Insufficient data for an image or some other arbitrary Acrobat error. Viewing the content of the page in a text editor looks a bit like a PDF document, by which I mean it is incomprehensible in general but with a few fragments of text and markup, including PDF identifiers.
The code to download the file is this:
def download_file( file_id):
folder_path = ".\\pdf_files\\"
file_download="http://myserver/documentimages.asp?SERVICE_ID=RETRIEVE_IMAGE&documentKey="
file_content = urllib.urlopen(file_download+file_id, proxies={})
file_local = open( folder_path + file_id + '.pdf', 'w' )
file_local.write(file_content.read())
file_content.close()
file_local.close()
If the same file is downloaded through a browser it looks fine, but is also larger on the disk. I am guessing that the problem might be to do with the encoding of the file when it is saved?
You need to write it as a binary file so:
file_local = open( folder_path + file_id + '.pdf', 'wb' )