I have some files without extension. I would like associate extensions to them. For that I have written a python program to read the data in the file. My doubt is how can I identify its type without the extension without using third party tools.
I have to identify a pdf, doc and text file only. Other type of files are not possible.
My server is cent os
You could read the first few bytes of the file and look for a “magic number”. The Wikipedia page on magic numbers suggests that PDF files begin with ASCII
%PDFand doc files begin with hex D0 CF 11 E0.Identifying text files is going be pretty tough in the general case, because a lot of standard magic numbers are actually ASCII text at the beginning of a binary file. For your case, if you can guarantee that you won’t be getting anything but PDF, DOC, or TXT, what you could probably get away with is checking for the PDF and DOC magic numbers, and then assuming it’s text if it’s not either of those.