I assume that this question may be slightly too open ended, but I am curious to know how does hashing works when web search engines index web pages. What are the some of the common hash codes being used for that purpose?
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
For Sphinx Search Engine which is an extremely popular open source product and comparable to Lucene, the hash function used is CRC. It converts each word found in douments which it is indexing to a 32 bit/64 bit int using CRC.