I tried to set up SQL Server to index and search MS Word and PDF files, according to http://www.codeproject.com/KB/architecture/sqlfulltextindexing.aspx
But after I set up in SQL Server, I found some word can not be searched in SQL Server. It seems that there is problem while SQL Server indexes those files.
Anyone experienced the same thing before? What alternatives can I use to index and search the content in MS Word and PDF files?
PDF uses both text and binary data. DOC is I think entirely binary. DocX is essentially a zipped file (hence binary). Doing text search on these formats without a proper parser may not be feasible.