how to detect the exact type of documents in java.Actually I will be getting lots of documents like articles,manuals and guides and so on in run time and I need to identify their actual type using java. The documents can be PDF,HTML,DOC,XML etc. I will not have even the extension of the documents. See I will be getting the documents from db..In case of PDFs only I will have the extension but in case of HTMLs and others I will not have the extension. Actually after getting the content only I will have to judge that which kind of content it is and then I will implement my business logic…Please help me.
Share
Apache Tika has facilities to detect MIME types of files:
http://tika.apache.org/
It is pretty heavy-weight, however, as it does a lot more than just MIME type detection.