Considering the positive effect of caching and data locality when searching in primary memory, I tend to use std::vector<> with std::pair<>-like key-value items and perform linear searches for both, if I know that the total amount of key-value items will never be “too large” to severely impact performance.
Lately I’ve been in lots of situations where I know beforehand that I will have huge amounts of key-value items and have therefore opted for std::map<> from the beginning.
I’d like to know how you make your decisions for the proper container in situations like the ones described above.
Do you
- always use
std::vector<>(or similar)? - always use
std::map<>(or similar)? - have a gut feeling for where in the item-count range one is preferable over the other?
- something entirely different?
Thanks!
I only rarely use
std::vectorwith a linear search (except in conjunction with binary searching as described below). I suppose for a small enough amount of data it would be better, but with that little data it’s unlikely that anything is going to provide a huge advantage.Depending on usage pattern, a binary search on an
std::vectorcan make sense though. Astd::mapworks well when you need to update the data regularly during use. In quite a few cases, however, you load up some data and then you use the data — but after you’ve loaded the data, it mostly remains static (i.e., it changes very little, if at all).In this case, it can make a lot of sense to load the data into a vector, sort it if necessary, and then do binary searches on the data (e.g.
std::lower_bound,std::equal_range). This gives pretty much the best of both worlds — low-complexity binary searches and good cache usage from high locality of reference (i.e., the vector is contiguous, as opposed to the linked structure of astd::map). The shortcoming, of course, is that insertions and deletions are slow — but this is one time I have used your original idea — store newly inserted data separately until it reaches some limit, and only then sort it in with the rest of the data, so a single search consists of a binary search of the main body of the data, followed by a linear search of the (small amount) of newly inserted data.