Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8313663
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 8, 20262026-06-08T20:33:58+00:00 2026-06-08T20:33:58+00:00

We are developing a highly performance critical software in C++. There we need a

  • 0

We are developing a highly performance critical software in C++. There we need a concurrent hash map and implemented one. So we wrote a benchmark to figure out, how much slower our concurrent hash map is compared with std::unordered_map.

But, std::unordered_map seems to be incredibly slow… So this is our micro-benchmark (for the concurrent map we spawned a new thread to make sure that locking does not get optimized away and note that I never inser 0 because I also benchmark with google::dense_hash_map, which needs a null value):

boost::random::mt19937 rng;
boost::random::uniform_int_distribution<> dist(std::numeric_limits<uint64_t>::min(), std::numeric_limits<uint64_t>::max());
std::vector<uint64_t> vec(SIZE);
for (int i = 0; i < SIZE; ++i) {
    uint64_t val = 0;
    while (val == 0) {
        val = dist(rng);
    }
    vec[i] = val;
}
std::unordered_map<int, long double> map;
auto begin = std::chrono::high_resolution_clock::now();
for (int i = 0; i < SIZE; ++i) {
    map[vec[i]] = 0.0;
}
auto end = std::chrono::high_resolution_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end - begin);
std::cout << "inserts: " << elapsed.count() << std::endl;
std::random_shuffle(vec.begin(), vec.end());
begin = std::chrono::high_resolution_clock::now();
long double val;
for (int i = 0; i < SIZE; ++i) {
    val = map[vec[i]];
}
end = std::chrono::high_resolution_clock::now();
elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end - begin);
std::cout << "get: " << elapsed.count() << std::endl;

(EDIT: the whole source code can be found here: http://pastebin.com/vPqf7eya)

The result for std::unordered_map is:

inserts: 35126
get    : 2959

For google::dense_map:

inserts: 3653
get    : 816

For our hand backed concurrent map (which does locking, although the benchmark is single threaded – but in a separate spawn thread):

inserts: 5213
get    : 2594

If I compile the benchmark program without pthread support and run everything in the main thread, I get the following results for our hand backed concurrent map:

inserts: 4441
get    : 1180

I compile with the following command:

g++-4.7 -O3 -DNDEBUG -I/tmp/benchmap/sparsehash-2.0.2/src/ -std=c++11 -pthread main.cc

So especially inserts on std::unordered_map seem to be extremely expensive – 35 seconds vs 3-5 seconds for other maps. Also the lookup time seems to be quite high.

My question: why is this? I read another question on stackoverflow where someone asks, why std::tr1::unordered_map is slower than his own implementation. There the highest rated answer states, that the std::tr1::unordered_map needs to implement a more complicated interface. But I can not see this argument: we use a bucket approach in our concurrent_map, std::unordered_map uses a bucket-approach too (google::dense_hash_map does not, but than std::unordered_map should be at least as fast than our hand backed concurrency-safe version?). Apart from that I can not see anything in the interface that force a feature which makes the hash map perform badly…

So my question: is it true that std::unordered_map seems to be very slow? If no: what is wrong? If yes: what is the reason for that.

And my main question: why is inserting a value into a std::unordered_map so terrible expensive (even if we reserve enough space at the beginning, it does not perform much better – so rehashing seems not to be the problem)?

EDIT:

First of all: yes the presented benchmark is not flawless – this is because we played around a lot with it and it is just a hack (for example the uint64 distribution to generate ints would in practice not be a good idea, exclude 0 in a loop is kind of stupid etc…).

At the moment most comments explain, that I can make the unordered_map faster by preallocating enough space for it. In our application this is just not possible: we are developing a database management system and need a hash map to store some data during a transaction (for example locking information). So this map can be everything from 1 (user just makes one insert and commits) to billions of entries (if full table scans happen). It is just impossible to preallocate enough space here (and just allocate a lot in the beginning will consume too much memory).

Furthermore, I apologize, that I did not state my question clear enough: I am not really interested in making unordered_map fast (using googles dense hash map works fine for us), I just don’t really understand where this huge performance differences come from. It can not be just preallocation (even with enough preallocated memory, the dense map is an order of magnitude faster than unordered_map, our hand backed concurrent map starts with an array of size 64 – so a smaller one than unordered_map).

So what is the reason for this bad performance of std::unordered_map? Or differently asked: Could one write an implementation of the std::unordered_map interface which is standard conform and (nearly) as fast as googles dense hash map? Or is there something in the standard that enforces the implementer to chose an inefficient way to implement it?

EDIT 2:

By profiling I see that a lot of time is used for integer divions. std::unordered_map uses prime numbers for the array size, while the other implementations use powers of two. Why does std::unordered_map use prime-numbers? To perform better if the hash is bad? For good hashes it does imho make no difference.

EDIT 3:

These are the numbers for std::map:

inserts: 16462
get    : 16978

Sooooooo: why are inserts into a std::map faster than inserts into a std::unordered_map… I mean WAT? std::map has a worse locality (tree vs array), needs to make more allocations (per insert vs per rehash + plus ~1 for each collision) and, most important: has another algorithmic complexity (O(logn) vs O(1))!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-08T20:34:01+00:00Added an answer on June 8, 2026 at 8:34 pm

    I found the reason: it is a Problem of gcc-4.7!!

    With gcc-4.7

    inserts: 37728
    get    : 2985
    

    With gcc-4.6

    inserts: 2531
    get    : 1565
    

    So std::unordered_map in gcc-4.7 is broken (or my installation, which is an installation of gcc-4.7.0 on Ubuntu – and another installation which is gcc 4.7.1 on debian testing).

    I will submit a bug report.. until then: DO NOT use std::unordered_map with gcc 4.7!

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

iam developing one application.In that i need to get the music files from the
Developing a project of mine I realize I have a need for some level
Developing a C# .NET 2.0 WinForm Application. Need the application to close and restart
Iam developing one application.In that iam placing the radio buttons(uiimageview) on table view and
Im developing an app for iPhone, in wich one of the functionalities is an
I've been developing a highly modified version of Lua (including a rewrite of the
I am developing an application in which i need to use lot of text-views
Background I am developing a highly modular application in pure Action Script 3 (we
I'm developing on a mobile gsm platform and I need to know the PIN
I am developing a project for one customer, where the design has a radio

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.