If someone could point me in the right direction that would be most helpful.
I have written a custom CMS where I want to be able to allow each individual user to upload documents (.doc .docx .pdf .rtf .txt etc) and then be able to search the contents of those files for keywords.
The CMS is written entirely in PHP and MySQL within a Linux environment.
Once uploaded the documents would be stored in the users private folder on the server “as is”. There will be hundreds if not thousands of documents stored by each user.
It is very important that the specific users files are searchable only by that user.
Could anyone point me in the right direction? I have had a look at Solr but these types of solutions seem so complicated. I have spent an entire week looking at different solutions and this is my last attempt at finding a solution.
Thank you in advance.
2 choices I see.
A search index per user. Their documents are indexed separately from everyone else’s. When they do a search, they hit their own search index. There is no danger of seeing other’s results, or getting scores based on contents from other’s documents. The downside is having to store and update the index separately. I would look into using Lucene for something like this, as the indices will be small.
A single search index. The users all share a search index. The results from searches would have to be filtered down so that only results were returned for that user. The upside is implementing a single search index (Solr would be great for this). The down side is the risk of cross talk between users searches. Scoring would be impacted by other users documents, resulting in poorer search results.
I hate to say it, but from a quality standpoint, I’d lean towards number 1. Number 2 seems more efficient and easier, but user results are more important to me.