I realize I am asking two separate questions at once here but I think they are related (even if just slightly).
Anyway what I want to do is compare two lists (not necessarily Java lists) of Strings and remove words that occur in both lists. I was thinking of using either an ArrayList or a HashSet withHashSet being favored as the lists are not ordered but my problem with a HashSet is that I’ve read that they don’t allow duplicates. This conflicts with my other requirement slightly as I want to be able to count the amount of times each word occurs but only display them once…if that makes sense.
Think of a WordCloud example.
Here’s what I have currently, saving contents of two text files to two ArrayLists:
ArrayList<String> words = new ArrayList<String>();
File file = new File(fileName);
Scanner scanner = new Scanner(file).useDelimiter("$");
while(scanner.hasNext())
{
String wrd = scanner.nextLine();
words.add(wrd);
}
I had to use two different ways of saving data as the two text files were structured differently
ArrayList<String> webWords = new ArrayList<String>();
File webFile = new File(webFileName);
BufferedReader br = new BufferedReader(new FileReader(webFileName));
String testLine = "", str = "";
int count = 0;
String s;
while ((testLine = br.readLine()) != null) {
str += testLine + " ";
}
StringTokenizer st = new StringTokenizer(str);
while (st.hasMoreTokens()) {
s = st.nextToken();
webWords.add(s);
count++;
}
Now I could easily create two HashSets in a similar fashion but I am using ArrayList for the moment as it allows duplicates and I am still unsure on which suits my needs best.
I need to compare the 2nd list with the 1st one and remove all the words in the 2nd list that appear in the 1st list.
My second issue is trying to determine (after I’ve removed the common words) which words occur most frequently.
Any help or direction would be greatly appreciated.
If I understand the requierements correctly then we can take a
HashMap<String, Integer>and put all words from list1 in it as keys, thus we avoid duplicatesthen we can iterate over the map entries, count word frequency and put it as entry value
UPDATE: “I want to be able to remove the words from list2 that appear in list1. And then iterate through the remaining words in list2 to find out how many times each word occurs”