Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 142543
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 11, 20262026-05-11T07:57:12+00:00 2026-05-11T07:57:12+00:00

I’ve been pondering this question for a while: Can you build a faster fundamental

  • 0

I’ve been pondering this question for a while:

Can you build a faster fundamental datastructure (i.e. linked list, hashtable, set, skiplist, bloom filter, red-black tree, etc.) on a multicore machine, by taking advantage of the fact that you’ve got more than one CPU?

I did some preliminary experimenting with pthreads, and found that pthread_create() takes on the order of 30us, but a simple hash_map insert takes far less time than that on a single core. And thus, it’s become hard for me to imagine creating a faster hash_map<>, since synchronization primitives and thread creation are so slow. I can also imagine tree traversal and balancing in parallel, but again, synchronization primitives would seem to make runtimes longer, not shorter.

It still feels intuitive to me that ‘I’ve got more CPU, and thus, I should be able to do it faster’, but I can’t quite wrap my head around a proof or counter-proof for that statement. I’ve experimented quite a bit in C++, but am now suspecting that other languages may offer better solutions (erlang?) for this task. Thoughts?

EDIT details: I think there are several programming / datastructure paradigms that are frequently used that could possibly be sped up. For example, I find myself frequently writing code that basically looks like this (where real data has been replaced with ‘rand()’)

static const int N = 1000000;  static const int M = 10000000; // 10x more lookups  hash_map<int, int> m;  // batch insert a bunch of interesting data  for (int i = 0; i < N; i++) m[rand()] = rand();   // Do some random access lookups.  for (int i = 0; i < M; i++) m[rand()]++; 

This kind of paradigm is frequently used for things like name-value settings & configuration data, batch processing, etc. The 10x (or more) lookup/insert ratio is what makes a traditional hash_map<> ideal for this sort of operation.

This could easily be split in half, with an insert phase and a lookup phase, and in a parallel world, there may be some ‘flush queue’ operation between the two halves. More difficult is the interleaved insert + lookup version:

hash_map<int, int> m;   for (int i = 0; i < N; i++) {     if (rand() % LOOKUP_RATIO == 0)       hash_map[rand()]++;  // 'lookup'     else       hash_map[rand()] = rand();  // 'insert'  } 

In that scenario, insert could be asynchronous as long as the insert queue were flushed before each lookup, and if LOOKUP_RATIO is large enough (say, >1000) then it becomes quite similar to the batch example above, but with some queueing. Although, the queueing implies synchronization primitives.

Imagine for a second, the following snippet:

hash_map<int,int> a; hash_map<int,int> b;  for (int i = 0; i < N; i++) {    // the following 2 lines could be executed in parallel    a[rand()] = rand();    b[rand()] = rand();  } 

And thus, a lookup might be done in ‘parallel’ by:

int lookup(int value) {    // The following 2 lines could be executed in parallel:    v1 = a[value];    v2 = b[value];    if (v1)  // pseudo code for 'value existed in a'      return v1;    else      return v2;  } 
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. 2026-05-11T07:57:13+00:00Added an answer on May 11, 2026 at 7:57 am

    The problem is that shared data is itself the bane of parallel computing. Ideally you want each core to be working on separate data, otherwise there will be overhead associated with synchronization. (How to communicate without shared state? By message passing.)

    Also, it’s a bit strange to talk about data structures being sped up. I find it more natural to talk about operations on the data structure being sped up, since different operations on different data structures have different characteristics. Is there a particular type of access that you want to speed up?

    EDIT, in response to the extra details: I assume that the goal is to have a single hash map which can be accessed in parallel, and its underpinnings could be multiple hash tables, but which would be transparently presented to the user of this data structure as a single hash table. Naturally, we would be concerned about spending too much time spinning on locks. Also at this level, we have to be aware of cache consistency problems. That is, if cores or processors have separate caches pointing to the same data, and one modifies the data, then the cached data on the other is invalidated. If this happens repeatedly, it can impose huge costs, and parallelism could be worse than having things on a single core. So I’m very wary of shared data.

    My instinct would be to have a pool of threads, each of which owns a different section of the hash table. The Hash would first map from key to hash table section, and then to an offset within that section. The update would be passed as a message to that thread which owns that section of the hash table. And that way, no one is trying to modify the same thing at once. Naturally, this is easier in languages (Erlang) which have features for asynchronous message passing concurrency than in others.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

No related questions found

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.