I’m developing a tool that searches the keyword entered by the user on a given site. My problem is, it searches the keyword only on html/web pages but not on the PDF/MS-Word files found on the site.
Can anyone suggest me some api/tool or provide the code that can search text from the given online PDF/MS-Word/Text file?
You could probably use Antiword for word files.
pdftotextcan be used for pdf-files.Both commands available through apt:
sudo apt-get install xpdf-utils antiword