I’m developing a ASP.Net MVC3 app which will have few hundred videos. I want to create a search system based on tags and other parameters like the user type that uploaded the video, the date of the video, video category, etc..
I have been looking around and Lucene.NET seems really good tool for full text search, but I don’t know if it’s the best solution for my project… I have read the tutorials and they recommend to keep the search index to a minimum but also that you should NOT hit your database for retrieving extra data that is not stored in the search index…
How this can be possible?
Lets put an example: I have a video row (as a concept, this is really held in different SQL tables) which has columns for the video id, the video name, the video file name, the full path, user id, user type, tags, creation date, video category, video subcategory, video location, etc… If I want to create a lucene search index I think I will have to put all the information in there so that later on I can query on every parameter, right?
This seems to me a duplicate of the SQL Database but with the overload of adding, editing and removing documents from lucene search index. Is this the standard scenario when using lucene? All the examples I have seen with lucene are based on a post id, post title and post body..
What do you think? Can you give me some light?
Yes, if you want to query multiple fields (including things like tags) from within lucene, you’ll need to make that data available to lucene. It might sound like this is duplication, but it is not redundant duplication – it is restructuring the data into a very different layout – indexed for search.
It should work fine; it is pretty much how search works here on stackoverflow (which is using lucene.net to perform the search).
It should be noted, however, that a few hundred is not a large sample: frankly you could do that any way you like, and it’ll take about the same amount of time. Writing a complex SQL query should work, as should full-text-search in the database (that is how stackoverflow’s search used to work), as should filtering objects in-memory (at the few-hundred level, you could trivially just cache all the data excluding video frames in memory).