Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6751489
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T12:56:00+00:00 2026-05-26T12:56:00+00:00

A map does binary search on all its elements, which has logarithmic complexity —

  • 0

A map does binary search on all its elements, which has logarithmic complexity — this means that for a small enough collection of objects, a map will perform worse than two vectors that have linear search.

How large should the object (key) pool be for a map to start performing better than two vectors?

Edit: A more generalized version of the question: how large should the object pool be for binary search to perform better than linear search?

I’m using strings as keys and the values are pointers, but my particular use case probably shouldn’t matter. I’m more curious to understand how to use the two tools properly.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T12:56:01+00:00Added an answer on May 26, 2026 at 12:56 pm

    If you’ll forgive my saying so, most of the answers sound to me like various ways of saying: “I don’t know”, without actually admitting that they don’t know. While I generally agree with the advice they’ve given, none of them seems to have attempted to directly address the question you asked: what is the break-even point.

    To be fair, when I read the question, I didn’t really know either. It’s one of those things what we all know the basics: for a small enough collection, a linear search will probably be faster, and for a large enough collection, a binary search will undoubtedly be faster. I, however, have never really had much reason to investigate anything about what the break-even point would really be. Your question got me curious, however, so I decided to write a bit of code to get at least some idea.

    This code is definitely a very quick hack (lots of duplication, only currently supports one type of key, etc.) but at least it might give some idea of what to expect:

    #include <set>
    #include <vector>
    #include <string>
    #include <time.h>
    #include <iostream>
    #include <algorithm>
    
    int main() {   
    
        static const int reps = 100000;
    
        std::vector<int> data_vector;
        std::set<int> data_set;
        std::vector<int> search_keys;
    
        for (int size=10; size<100; size += 10) {
            data_vector.clear();
            data_set.clear();
            search_keys.clear();
    
            int num_keys = size / 10;   
            for (int i=0; i<size; i++) {
                int key = rand();
    
                if (i % num_keys == 0)
                    search_keys.push_back(key);
    
                data_set.insert(key);
                data_vector.push_back(key);
            }
    
            // Search for a few keys that probably aren't present.
            for (int i=0; i<10; i++)
                search_keys.push_back(rand());
    
            long total_linear =0, total_binary = 0;
    
            clock_t start_linear = clock();
            for (int k=0; k<reps; k++) {
                for (int j=0; j<search_keys.size(); j++) {
                    std::vector<int>::iterator pos = std::find(data_vector.begin(), data_vector.end(), search_keys[j]);
                    if (pos != data_vector.end())
                        total_linear += *pos;
                }
            }
            clock_t stop_linear = clock();                      
    
            clock_t start_binary = clock();
            for (int k=0; k<reps; k++) {
                for (int j=0; j<search_keys.size(); j++) {
                    std::set<int>::iterator pos = data_set.find(search_keys[j]);
                    if (pos != data_set.end())
                        total_binary += *pos;
                }
            }
            clock_t stop_binary = clock();
    
            std::cout << "\nignore: " << total_linear << " ignore also: " << total_binary << "\n";
    
            std::cout << "For size = " << size << "\n";
            std::cout << "\tLinear search time = " << stop_linear - start_linear << "\n";
            std::cout << "\tBinary search time = " << stop_binary - start_binary << "\n";
        }
        return 0;
    }
    

    Here are the results I get running this on my machine:

    ignore: 669830816 ignore also: 669830816
    For size = 10
            Linear search time = 37
            Binary search time = 45
    
    ignore: 966398112 ignore also: 966398112
    For size = 20
            Linear search time = 60
            Binary search time = 47
    
    ignore: 389263520 ignore also: 389263520
    For size = 30
            Linear search time = 83
            Binary search time = 63
    
    ignore: -1561901888 ignore also: -1561901888
    For size = 40
            Linear search time = 106
            Binary search time = 65
    
    ignore: -1741869184 ignore also: -1741869184
    For size = 50
            Linear search time = 127
            Binary search time = 69
    
    ignore: 1130798112 ignore also: 1130798112
    For size = 60
            Linear search time = 155
            Binary search time = 75
    
    ignore: -1949669184 ignore also: -1949669184
    For size = 70
            Linear search time = 173
            Binary search time = 83
    
    ignore: -1991069184 ignore also: -1991069184
    For size = 80
            Linear search time = 195
            Binary search time = 90
    
    ignore: 1750998112 ignore also: 1750998112
    For size = 90
            Linear search time = 217
            Binary search time = 79
    

    Obviously that’s not the only possible test (or even close to the best one possible), but it seems to me that even a little hard data is better than none at all.

    Edit: I would note for the record that I see no reason that code using two vectors (or a vector of pairs) can’t be just as clean as code using a set or map. Obviously you’d want to put the code for it into a small class of its own, but I see no reason at all that it couldn’t present precisely the same interface to the outside world that map does. In fact, I’d probably just call it a “tiny_map` (or something on that order).

    One of the basic points of OO programming (and it remains the case in generic programming, to least some degree) is to separate the interface from the implementation. In this case, you’re talking about purely an implementation detail that need not affect the interface at all. In fact, if I were writing a standard library, I’d be tempted to incorporate this as a “small map optimization” analogous to the common small string optimization. Basically, just allocate an array of 10 (or so) objects of value_type directly in the map object itself, and use them when/if the map is small, then move the data to a tree iff it grows large enough to justify it. The only real question is whether people us tiny maps often enough to justify the work.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Am I right in assuming that adding/removing elements to an std::map does not effect
Does FileChannel#map allocate all the memory needed for the resulting ByteBuffer immediately, or is
How do I add extra data into an atom feed that does not map
Does anyone know of an Image Map-like Blackberry Control that I can use in
This is a sample that represents my problem. The map will serialize perfectly fine
This error: error C2678: binary '=' : no operator found which takes a left-hand
Does map() iterate through the list like for would? Is there a value in
Possible Duplicate: What does map(&:name) mean in Ruby? I was watching a railscast and
Does std::map move around already inserted values when inserting new data ?
I've a question about using sqlSave. How does R map RODBC data in the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.