How large does a collection have to be for std::map to outpace a sorted std::vector >?
I’ve got a system where I need several thousand associative containers, and std::map seems to carry a lot of overhead in terms of CPU cache. I’ve heard somewhere that for small collections std::vector can be faster — but I’m wondering where that line is….
EDIT: I’m talking about 5 items or fewer at a time in a given structure. I’m concerned most with execution time, not storage space. I know that questions like this are inherently platform-specific, but I’m looking for a “rule of thumb” to use.
Billy3
It’s not really a question of size, but of usage.
A sorted vector works well when the usage pattern is that you read the data, then you do lookups in the data.
A map works well when the usage pattern involves a more or less arbitrary mixture of modifying the data (adding or deleting items) and doing queries on the data.
The reason for this is fairly simple: a map has higher overhead on an individual lookup (thanks to using linked nodes instead of a monolithic block of storage). An insertion or deletion that maintains order, however, has a complexity of only O(lg N). An insertion or deletion that maintains order in a vector has a complexity of O(N) instead.
There are, of course, various hybrid structures that can be helpful to consider as well. For example, even when data is being updated dynamically, you often start with a big bunch of data, and make a relatively small number of changes at a time to it. In this case, you can load your data into memory into a sorted vector, and keep the (small number of) added objects in a separate vector. Since that second vector is normally quite small, you simply don’t bother with sorting it. When/if it gets too big, you sort it and merge it with the main data set.
Edit2: (in response to edit in question). If you’re talking about 5 items or fewer, you’re probably best off ignoring all of the above. Just leave the data unsorted, and do a linear search. For a collection this small, there’s effectively almost no difference between a linear search and a binary search. For a linear search you expect to scan half the items on average, giving ~2.5 comparisons. For a binary search you’re talking about log2 N, which (if my math is working this time of the morning) works out to ~2.3 — too small a difference to care about or notice (in fact, a binary search has enough overhead that it could very easily end up slower).