Yes, result.content will contain the raw content of that page.…

Question

0

Asked: May 11, 20262026-05-11T07:03:37+00:00 2026-05-11T07:03:37+00:00

I get a file via a HTTP upload and need to make sure its

0

I get a file via a HTTP upload and need to make sure its a PDF file. The programing language is Python, but this should not matter.

I thought of the following solutions:

Check if the first bytes of the string are %PDF. This is not a good check but prevents the user from uploading other files accidentally.
Use libmagic (the file command in bash uses it). This does exactly the same check as in (1)
Use a library to try to read the page count out of the file. If the lib is able to read a page count it should be a valid PDF file. Problem: I don’t know a Python library that can do this

Are there solutions using a library or another trick?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-11T07:03:38+00:00

The two most commonly used PDF libraries for Python are:

Both are pure python so should be easy to install as well be cross-platform.

With pypdf it would probably be as simple as doing:

from pypdf import PdfReader reader = PdfReader("upload.pdf")

This should be enough, but reader will now have the metadata and pages attributes if you want to do further checking.

As Carl answered, pdftotext is also a good solution, and would probably be faster on very large documents (especially ones with many cross-references). However it might be a little slower on small PDF’s due to system overhead of forking a new process, etc.

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions