Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4321754
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 21, 20262026-05-21T08:47:11+00:00 2026-05-21T08:47:11+00:00

I’m writing a function where I need a significant amount of heap memory. Is

  • 0

I’m writing a function where I need a significant amount of heap memory. Is it possible to tell the compiler that those data will be accessed frequently within a specific for loop, so as to improve performance (through compile options or similar)?

The reason I cannot use the stack is that the number of elements I need to store is big, and I get segmentation fault if I try to do it.

Right now the code is working but I think it could be faster.

UPDATE:
I’m doing something like this

vector< set<uint> > vec(node_vec.size());
for(uint i = 0; i < node_vec.size(); i++)
  for(uint j = i+1; j < node_vec.size(); j++)
    // some computation, basic math, store the result in variable x
      if( x > threshold ) {
         vec[i].insert(j);
         vec[j].insert(i);
      }

some details:
– I used hash_set, little improvement, beside the fact that hash_set is not available in all machines I have for simulation purposes
– I tried to allocate vec on the stack using arrays but, as I said, I might get segmentation fault if the number of elements is too big

If node_vec.size() is, say, equal to k, where k is of the order of a few thousands, I expect vec to be 4 or 5 times bigger than node_vec. With this order of magnitude the code appears to be slow, considering the fact that I have to run it many times. Of course, I am using multithreading to parallelize these calls, but I can’t get the function per se to run much faster than what I’m seeing right now.

Would it be possible, for example, to have vec allocated in the cache memory for fast data retrieval, or something similar?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-21T08:47:12+00:00Added an answer on May 21, 2026 at 8:47 am

    UPDATE

    vector< set<uint> > vec(node_vec.size());
    for(uint i = 0; i < node_vec.size(); i++)
      for(uint j = i+1; j < node_vec.size(); j++)
        // some computation, basic math, store the result in variable x
          if( x > threshold ) {
             vec[i].insert(j);
             vec[j].insert(i);
          }
    

    That still doesn’t show much, because we cannot know how often the condition x > threshold will be true. If x > threshold is very frequently true, then the std::set might be the bottleneck, because it has to do a dynamic memory allocation for every uint you insert.

    Also we don’t know what "some computation" actually means/does/is. If it does much, or does it in the wrong way that could be the bottleneck.

    And we don’t know how you need to access the result.

    Anyway, on a hunch:

        vector<pair<int, int> > vec1;
        vector<pair<int, int> > vec2;
    
        for (uint i = 0; i < node_vec.size(); i++)
        {
            for (uint j = i+1; j < node_vec.size(); j++)
            {
                // some computation, basic math, store the result in variable x
                if (x > threshold)
                {
                    vec1.push_back(make_pair(i, j));
                    vec2.push_back(make_pair(j, i));
                }
            }
        }
    

    If you can use the result in that form, you’re done. Otherwise you could do some post-processing. Just don’t copy it into a std::set again (obviously). Try to stick to std::vector<POD>. E.g. you could build an index into the vectors like this:

        // ...
        vector<int> index1 = build_index(node_vec.size(), vec1);
        vector<int> index2 = build_index(node_vec.size(), vec2);
        // ...
    }    
    
    vector<int> build_index(size_t count, vector<pair<int, int> > const& vec)
    {
        vector<int> index(count, -1);
    
        size_t i = vec.size();
        do
        {
            i--;
            assert(vec[i].first >= 0);
            assert(vec[i].first < count);
            index[vec[i].first] = i;
        }
        while (i != 0);
    
        return index;
    }
    

    ps.: I’m almost sure your loop is not memory-bound. Can’t be sure though… if the "nodes" you’re not showing us are really big it might still be.


    Original answer:

    There is no easy I_will_access_this_frequently_so_make_it_fast(void* ptr, size_t len)-kind-of solution.

    You can do some things though.

    1. Make sure the compiler can "see" the implementation of every function that’s called inside critical loops. What is necessary for the compiler to be able to "see" the implementation depends on the compiler. There is one way to be sure though: define all relevant functions in the same translation unit before the loop, and declare them as inline.

      This also means you should not by any means call "external" functions in those critical loops. And by "external" functions I mean things like system-calls, runtime-library stuff or stuff implemented in a DLL/SO. Also don’t call virtual functions and don’t use function pointers. And or course don’t allocate or free memory (inside the critical loops).

    2. Make sure you use an optimal algorithm. Linear optimization is moot if the complexity of the algorithm is higher than necessary.

    3. Use the smallest possible types. E.g. don’t use int if signed char will do the job. That’s something I wouldn’t normally recommend, but when processing a large chunk of memory it can increase performance quite a lot. Especially in very tight loops.

    4. If you’re just copying or filling memory, use memcpy or memset. Disable the intrinsic version of those two functions if the chunks are larger then about 50 to 100 bytes.

    5. Make sure you access the data in a cache-friendly manner. The optimum is "streaming" – i.e. accessing the memory with ascending or descending addresses. You can "jump" ahead some bytes at a time, but don’t jump too far. The worst is random access to a big block of memory. E.g. if you have to work on a 2 dimensional matrix (like a bitmap image) where p[0] to p[1] is a step "to the right" (x + 1), make sure the inner loop increments x and the outer increments y. If you do it the other way around performance will be much much worse.

    6. If your pointers are alias-free, you can tell the compiler (how that’s done depends on the compiler). If you don’t know what alias-free means I recommend searching the net and your compiler’s documentation, since an explanation would be beyond the scope.

    7. Use intrinsic SIMD instructions if appropriate.

    8. Use explicit prefetch instructions if you know which memory locations will be needed in the near future.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

link Im having trouble converting the html entites into html characters, (&# 8217;) i
That's pretty much it. I'm using Nokogiri to scrape a web page what has
I have a French site that I want to parse, but am running into
I ran into a problem. Wrote the following code snippet: teksti = teksti.Trim() teksti
Seemingly simple, but I cannot find anything relevant on the web. What is the
Does anyone know how can I replace this 2 symbol below from the string
this is what i have right now Drawing an RSS feed into the php,
I'm trying to decode HTML entries from here NYTimes.com and I cannot figure out
I have just tried to save a simple *.rtf file with some websites and
I want to count how many characters a certain string has in PHP, but

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.