I have a simple application that reads data in small strings from large text files and saves them to a database. To actually save each such String, the application calls the following method several (may thousands, or more) times:
setValue(String value)
{
if (!ignore(value))
{
// Save the value in the database
}
}
Currently, I implement the ignore() method by just successively comparing a set of Strings, e.g.
public boolean ignore(String value)
{
if (value.equalsIgnoreCase("Value 1") || (value.equalsIgnoreCase("Value 2"))
{
return true;
}
return false;
}
However, because I need to check against many such “ignorable” values, which will be defined in another part of the code, I need to use a data structure for this check, instead of multiple consecutive if statements.
So, my question is, what would be the fastest data structure from standard Java to to implement this? A HashMap? A Set? Something else?
Initialization time is not an issue, since it will happen statically and once per application invocation.
EDIT: The solutions suggested thus far (including HashSet) appear slower than just using a String[] with all the ignored words and just running “equalsIgnoreCase” against each of these.
Use a HashSet, storing the values in lowercase, and its contains() method, which has better lookup performance than TreeSet (constant-time versus log-time for contains).
Storing the values in lowercase and searching for the lowercased input avoids the hassle of dealing with case during comparison, so you get the full speed of the HashSet implementation and zero collection-related code to write (eg Collator, Comparator etc).
EDITED
Thanks to Jon Skeet for pointing out that certain Turkish characters behave oddly when calling
toLowerCase(), but if you’re not intending on supporting Turkish input (or perhaps other languages with non-standard case issues) then this approach will work well for you.