Possible Duplicate:
How to index pdf, ppt, xl files in lucene (java based or python or php any of these is fine)?
I need to search a string in a collection of files in a folder includes the pdf, docx, txt formats. Is it possible to search a string using lucene.net.
please give some references helpful for this..
thank u..
You would need to extract the text of the various files (pdf, docx, txt) and insert that text into a that to a Lucene index. Lucene doesn’t have the ability to read text out of the various document formats
Generally search for “extract {document format} text in .net” and you should find plenty of resources.