I’ve come across an interesting problem which I would love to get some input on.
I have a program that generates a set of numbers (based on some predefined conditions). Each set contains up to 6 numbers that do not have to be unique with integers that ranges from 1 to 100).
I would like to somehow store every set that is created so that I can quickly check if a certain set with the exact same numbers (order doesn’t matter) has previously been generated.
Speed is a priority in this case as there might be up to 100k sets stored before the program stops (maybe more, but most the time probably less)! Would anyone have any recommendations as to what data structures I should use and how I should approach this problem?
What I have currently is this:
Sort each set before storing it into a HashSet of Strings. The string is simply each number in the sorted set with some separator.
For example, the set {4, 23, 67, 67, 71} would get encoded as the string “4-23-67-67-71” and stored into the HashSet. Then for every new set generated, sort it, encode it and check if it exists in the HashSet.
Thanks!
if you break it into pieces it seems to me that
you do this n times, which gives you O(n).
this is already optimal as you have to touch every element once anyways 🙂
you might run into problems depending on the range of your random numbers.
e.g. assume you generate only numbers between one and one, then there’s obviously only one possible outcome (“1-1-1-1-1-1”) and you’ll have only collisions from there on. however, as long as the number of possible sequences is much larger than the number of elements you generate i don’t see a problem.
one tip: if you know the number of generated elements beforehand it would be wise to initialize the hashset with the correct number of elements (i.e.
new HashSet<String>( 100000 ) );p.s. now with other answers popping up i’d like to note that while there may be room for improvement on a microscopic level (i.e. using language specific tricks), your overal approach can’t be improved.