I know one way to solve this question is to Hash the words and its corresponding word count. Then traverse the Hash map and figure out the top 3.
Is there any better way to solve this ? Will it be better if I use a BST instead of a HashMap ?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Basically a histogram is the standard way of doing so, have your pick of which implementation you want to use for the histogram interface, the difference between them is actually instance specific – each has its advantages and disadvantages.
You might also want to consider a map-reduce design to get the words count:
This approach allows great scalability if you have a lot of documents – using the map-reduce interface, or elegant solution if you like functional programming.
Note: this approach is basically same as the hash solution, since the mapper is passing the
(key,values)tuple using hashing.