Description | A Java program to read a text file and print each of the unique words in alphabetical order together with the number of times the word occurs in the text.
The program should declare a variable of type Map<String, Integer> to store the words and corresponding frequency of occurrence. Which concrete type, though? TreeMap<String, Number> or HashMap<String, Number> ?
The input should be converted to lower case.
A word does not contain any of these characters: \t\t\n]f.,!?:;\'()'
Example output |
Word Frequency a 1 and 5 appearances 1 as 1 . . .
Remark | I know, I’ve seen elegant solutions to this in Perl with roughly two lines of code. However, I want to see it in Java.
Edit: Oh yeah, it be helpful to show an implementation using one of these structures (in Java).
TreeMapseems a no-brainer to me – simply because of the ‘in alphabetical order’ requirement.HashMaphas no ordering when you iterate through it;TreeMapiterates in the natural key order.EDIT: I think Konrad’s comment may have been suggesting ‘use
HashMap, then sort.’ This is good because although we’ll have N iterations initially, we’ll have K <= N keys by the end due to duplicates. We might as well save the expensive bit (sorting) until the end when we’ve got fewer keys than take the small-but-non-constant hit of keeping it sorted as we go.Having said that, I’m sticking to my answer for the moment: because it’s the simplest way of achieving the goal. We don’t really know that the OP is particularly worried about performance, but the question implies that he’s concerned about the elegance and brevity. Using a
TreeMapmakes this incredibly brief, which appeals to me. I suspect that if performance is really an issue, there may be a better way of attacking it than eitherTreeMaporHashMap🙂