I’m writing a function where I need a significant amount of heap memory. Is

Question

0

Asked: May 21, 20262026-05-21T08:47:11+00:00 2026-05-21T08:47:11+00:00

I’m writing a function where I need a significant amount of heap memory. Is

0

I’m writing a function where I need a significant amount of heap memory. Is it possible to tell the compiler that those data will be accessed frequently within a specific for loop, so as to improve performance (through compile options or similar)?

The reason I cannot use the stack is that the number of elements I need to store is big, and I get segmentation fault if I try to do it.

Right now the code is working but I think it could be faster.

UPDATE:
I’m doing something like this

vector< set<uint> > vec(node_vec.size());
for(uint i = 0; i < node_vec.size(); i++)
  for(uint j = i+1; j < node_vec.size(); j++)
    // some computation, basic math, store the result in variable x
      if( x > threshold ) {
         vec[i].insert(j);
         vec[j].insert(i);
      }

some details:
– I used hash_set, little improvement, beside the fact that hash_set is not available in all machines I have for simulation purposes
– I tried to allocate vec on the stack using arrays but, as I said, I might get segmentation fault if the number of elements is too big

If node_vec.size() is, say, equal to k, where k is of the order of a few thousands, I expect vec to be 4 or 5 times bigger than node_vec. With this order of magnitude the code appears to be slow, considering the fact that I have to run it many times. Of course, I am using multithreading to parallelize these calls, but I can’t get the function per se to run much faster than what I’m seeing right now.

Would it be possible, for example, to have vec allocated in the cache memory for fast data retrieval, or something similar?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-21T08:47:12+00:00

UPDATE

vector< set<uint> > vec(node_vec.size());
for(uint i = 0; i < node_vec.size(); i++)
  for(uint j = i+1; j < node_vec.size(); j++)
    // some computation, basic math, store the result in variable x
      if( x > threshold ) {
         vec[i].insert(j);
         vec[j].insert(i);
      }

That still doesn’t show much, because we cannot know how often the condition x > threshold will be true. If x > threshold is very frequently true, then the std::set might be the bottleneck, because it has to do a dynamic memory allocation for every uint you insert.

Also we don’t know what "some computation" actually means/does/is. If it does much, or does it in the wrong way that could be the bottleneck.

And we don’t know how you need to access the result.

Anyway, on a hunch:

    vector<pair<int, int> > vec1;
    vector<pair<int, int> > vec2;

    for (uint i = 0; i < node_vec.size(); i++)
    {
        for (uint j = i+1; j < node_vec.size(); j++)
        {
            // some computation, basic math, store the result in variable x
            if (x > threshold)
            {
                vec1.push_back(make_pair(i, j));
                vec2.push_back(make_pair(j, i));
            }
        }
    }

If you can use the result in that form, you’re done. Otherwise you could do some post-processing. Just don’t copy it into a std::set again (obviously). Try to stick to std::vector<POD>. E.g. you could build an index into the vectors like this:

    // ...
    vector<int> index1 = build_index(node_vec.size(), vec1);
    vector<int> index2 = build_index(node_vec.size(), vec2);
    // ...
}    

vector<int> build_index(size_t count, vector<pair<int, int> > const& vec)
{
    vector<int> index(count, -1);

    size_t i = vec.size();
    do
    {
        i--;
        assert(vec[i].first >= 0);
        assert(vec[i].first < count);
        index[vec[i].first] = i;
    }
    while (i != 0);

    return index;
}

ps.: I’m almost sure your loop is not memory-bound. Can’t be sure though… if the "nodes" you’re not showing us are really big it might still be.

Original answer:

There is no easy I_will_access_this_frequently_so_make_it_fast(void* ptr, size_t len)-kind-of solution.

You can do some things though.

Make sure the compiler can "see" the implementation of every function that’s called inside critical loops. What is necessary for the compiler to be able to "see" the implementation depends on the compiler. There is one way to be sure though: define all relevant functions in the same translation unit before the loop, and declare them as inline.

This also means you should not by any means call "external" functions in those critical loops. And by "external" functions I mean things like system-calls, runtime-library stuff or stuff implemented in a DLL/SO. Also don’t call virtual functions and don’t use function pointers. And or course don’t allocate or free memory (inside the critical loops).
Make sure you use an optimal algorithm. Linear optimization is moot if the complexity of the algorithm is higher than necessary.
Use the smallest possible types. E.g. don’t use int if signed char will do the job. That’s something I wouldn’t normally recommend, but when processing a large chunk of memory it can increase performance quite a lot. Especially in very tight loops.
If you’re just copying or filling memory, use memcpy or memset. Disable the intrinsic version of those two functions if the chunks are larger then about 50 to 100 bytes.
Make sure you access the data in a cache-friendly manner. The optimum is "streaming" – i.e. accessing the memory with ascending or descending addresses. You can "jump" ahead some bytes at a time, but don’t jump too far. The worst is random access to a big block of memory. E.g. if you have to work on a 2 dimensional matrix (like a bitmap image) where p[0] to p[1] is a step "to the right" (x + 1), make sure the inner loop increments x and the outer increments y. If you do it the other way around performance will be much much worse.
If your pointers are alias-free, you can tell the compiler (how that’s done depends on the compiler). If you don’t know what alias-free means I recommend searching the net and your compiler’s documentation, since an explanation would be beyond the scope.
Use intrinsic SIMD instructions if appropriate.
Use explicit prefetch instructions if you know which memory locations will be needed in the near future.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m writing a function where I need a significant amount of heap memory. Is

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply