So I’m trying to search through an Arraylist in Java and create a histogram consisting of lengths of string vs frequency that length is present in large text files. I’ve come up with a brute force algorithm but its much too slow to be of use in large data files. Is there a more efficient way of processing through an Arraylist? I’ve included the brute force method I came up with.
for (int i = 0; i < (maxLen + 1); i++)
{
int hit = 0;
for (int j = 0; j < list.size(); j++)
{
if (i == list.get(j).length())
++hit;
histogram[i] = hit;
}
}
This is terribly inefficient.
How about instead of looping through each possible length value, then each available word, you simply loop through the available words in the document and count their lengths?
For example:
Then, if the key does not exist in the
HashMap, you know no words of that length exist in the document. If the key does exist, you can look up exactly how many times that occurred.Note: Some aspects of this code example were made in order to prevent any additional confusion about boxing and unboxing. It is possible to write it slightly cleaner, and I would certainly do so in a production environment. Also, it assumes that you don’t have knowledge of any minimum or maximum lengths of words (and is thus slightly more flexible, scalable, and catch-all). Otherwise, the other techniques for simply declaring a primitive array will work just as well (see Jon Skeet’s answer).
For a cleaner version that takes advantage of autoboxing: