How do I make sure that when I merge a few temp indexes (that might or might not contain duplicate documents) I end up with one copy in the main index ?
Thanks
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Here’s a way:
Provided that each document has an id, and that duplicate documents have the same id:
The gist is: delete all documents having the same id as the current document from the other indexes. After having done this for all indexes, merge them.
I know this is not elegant, but I do not know a better algorithm.