In my application, i will receive a file. I have to check whether the file has searchable text(text content) or non searchable text(images) and display.
I cannot go with the file extension, because in PDF files, we can have non searchable types also.
I need java code for this. Can anyone help me please.
A practical solution to this problem will involve figuring out the MIME type of the unknown files from the file content. Then you’d need to build a mapping from MIME types to classes for extracting text for the corresponding file type.
There are libraries for doing the first part (identifying MIME types), though this is a heuristic process, and can (in theory) return the wrong answer or (in practice) “unknown”. Here is a sample of SO questions and other references on how to do this: