Just wondering if there is any tips on improving search times (full-text).
How do large sites like stackoverflow, reddit, etc, implement their search functions?
(Sorry for the vagueness – i am a newbie)
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Oh wow, there are entire courses and papers written on this…
Firstly, if you’re storing in a database, there are indexes and different joins and views and all sorts of fun for speeding up your queries.
However you’ve specified full text search, so I’ll direct you to this page which has a comparison of the most common techniques. Now this is for arrays, but will give you an understanding of how splitting or searching can be improved or varied.
Next, take a read of this Wikipedia article on string searching. There are the naive search where you just look, or ones where you create an index first, so that future searches let you jump – like chapters or page numbers in a book of text.
The index or pattern storage techniques are also very useful in compression, and that’s yet another way to help speed up searching – if you build the compressed string, you can be clever and jump to the compressed section, extract and compare, depending on whether you have a limited number of patterns that you are searching for, or whether you have anything-goes.
Then there’s fuzzy searching as well, where you don’t get an exact match – you may do this on some ‘closeness’ score – like a percentage of character matches.
Hopefully that gives you a good starting point at least!