Which techniqes would you use to implement a search for contents in a column on a very big table in MySql? Say for instance that you have 10.000.000 emails stored in a table in the database and would like to implement a subject search, that would enable me to search for one or more words that was present in the email subject. If the user searched for ‘christmas santa’ you should find a emails with subjects like ‘Santa visits us this christmas’ and ‘christmas, will santa ever show’.
My idea is to process all the words in the subjects (strip all numbers, special signs, commas etc) and save each word in an index table, where I have a unique index on the word column. Then I would link that to the email table by a many to many relationship table.
Is there a better way to perform wildcard searches on very big tables ?
Is there databases that natively supports this kind of searches ?
You could use FULLTEXT indexes if you are using MyISAM as the storage engine. However, MySQL in general is not very good with text search.
A much better option would be to go with a dedicated text indexing solution such as Lucene or Sphinx. Personally I’d recommend Sphinx – it has great integration with PHP and MySQL and is very, very fast (can be used to speed up even ordinary queries – performs very fast grouping and ordering).
Wikipedia has a nice list of different indexing engines – here.