A map does binary search on all its elements, which has logarithmic complexity — this means that for a small enough collection of objects, a map will perform worse than two vectors that have linear search.
How large should the object (key) pool be for a map to start performing better than two vectors?
Edit: A more generalized version of the question: how large should the object pool be for binary search to perform better than linear search?
I’m using strings as keys and the values are pointers, but my particular use case probably shouldn’t matter. I’m more curious to understand how to use the two tools properly.
If you’ll forgive my saying so, most of the answers sound to me like various ways of saying: “I don’t know”, without actually admitting that they don’t know. While I generally agree with the advice they’ve given, none of them seems to have attempted to directly address the question you asked: what is the break-even point.
To be fair, when I read the question, I didn’t really know either. It’s one of those things what we all know the basics: for a small enough collection, a linear search will probably be faster, and for a large enough collection, a binary search will undoubtedly be faster. I, however, have never really had much reason to investigate anything about what the break-even point would really be. Your question got me curious, however, so I decided to write a bit of code to get at least some idea.
This code is definitely a very quick hack (lots of duplication, only currently supports one type of key, etc.) but at least it might give some idea of what to expect:
Here are the results I get running this on my machine:
Obviously that’s not the only possible test (or even close to the best one possible), but it seems to me that even a little hard data is better than none at all.
Edit: I would note for the record that I see no reason that code using two vectors (or a vector of pairs) can’t be just as clean as code using a set or map. Obviously you’d want to put the code for it into a small class of its own, but I see no reason at all that it couldn’t present precisely the same interface to the outside world that
mapdoes. In fact, I’d probably just call it a “tiny_map` (or something on that order).One of the basic points of OO programming (and it remains the case in generic programming, to least some degree) is to separate the interface from the implementation. In this case, you’re talking about purely an implementation detail that need not affect the interface at all. In fact, if I were writing a standard library, I’d be tempted to incorporate this as a “small map optimization” analogous to the common small string optimization. Basically, just allocate an array of 10 (or so) objects of
value_typedirectly in the map object itself, and use them when/if the map is small, then move the data to a tree iff it grows large enough to justify it. The only real question is whether people us tiny maps often enough to justify the work.