I have data that is a set of ordered ints
[0] = 12345
[1] = 12346
[2] = 12454
etc.
I need to check whether a value is in the collection in C++, what container will have the lowest complexity upon retrieval? In this case, the data does not grow after initiailization. In C# I would use a dictionary, in c++, I could either use a hash_map or set. If the data were unordered, I would use boost’s unordered collections. However, do I have better options since the data is ordered? Thanks
EDIT: The size of the collection is a couple of hundred items
Just to detail a bit over what have already been said.
Sorted Containers
The immutability is extremely important here:
std::mapandstd::setare usually implemented in terms of binary trees (red-black trees for my few versions of the STL) because of the requirements on insertion, retrieval and deletion operation (and notably because of the invalidation of iterators requirements).However, because of immutability, as you suspected there are other candidates, not the least of them being array-like containers. They have here a few advantages:
Several “Random Access Containers” are available here:
Boost.Arraystd::vectorstd::dequeSo the only thing you actually need to do can be broken done in 2 steps:
std::sorton it.std::binary_search, which has O(log(n)) complexityBecause of cache locality, the search will in fact be faster even though the asymptotic behavior is similar.
If you don’t want to reinvent the wheel, you can also check Alexandrescu’s
[AssocVector][1]. Alexandrescu basically ported thestd::setandstd::mapinterfaces over astd::vector:Unsorted Containers
Actually, if you really don’t care about order and your collection is kind of big, then a
unordered_setwill be faster, especially because integers are so trivial to hashsize_t hash_method(int i) { return i; }.This could work very well… unless you’re faced with a collection that somehow causes a lot of collisions, because then unsorted containers will search over the “collisions” list of a given hash in linear time.
Conclusion
Just try the sorted
std::vectorapproach and theboost::unordered_setapproach with a “real” dataset (and all optimizations on) and pick whichever gives you the best result.Unfortunately we can’t really help more there, because it heavily depends on the size of the dataset and the repartition of its elements