I am puzzled by this.. This:
for (int j=0; j<100; ++j) {
long* data = new long[0];
}
clock_t launch = clock();
sim.Run();
clock_t done = clock();
runs 50% faster than this alone:
clock_t launch = clock();
sim.Run();
clock_t done = clock();
This is when using -O3. If I use -O0 there is no difference in execution time. Whether it is long or short doesn’t matter. The length of the vector doesn’t change anything, either. I am not using data anywhere. If I delete[] data; in the loop the improvement disappears. As I reduce the number of iterations below 100 the performance gain reduces; above 100 doesn’t make any difference.
If this was Java I would think that I am triggering the GC, but this is c++! Also, it is a single-threaded software so it shouldn’t be memory sharing optimisation stuff.
What can this be? Does this behaviour be a symptom of bad memory management in my code? Thanks!
How do you say, “50% longer”? Are we talking minutes, or are we talking seconds?
In the case of a few seconds, it may simply be a synchronization error (two
clock_tmay differ up to two seconds without the real time changing more than a few hundredths of a second).But my guess is that we’re looking at longer times; and without looking at your code, I suspect that by allocating “leaked” memory you “prime” the memory heap, allowing for faster retrieval of information later on.
Which tells me that, yes, probably you’re managing your memory less than optimally, and might benefit from pre-allocating and reusing memory (“object pools“).
With “priming” the memory heap, I mean that memory allocation is usually demanded to a memory manager that keeps track of heap memory; using a linked list in the simpler case. Even if you have no garbage collector, you still have a memory manager which is what lies behind
malloc,free,reallocand so on (andnewalso, for that matter). The MM can operate by requesting a large chunk to the operating system, and then doling it out to the application, and/or by “tweaking” the requests you make in requests that may be better/faster handled by the OS. For example, the OS usually “sees” memory in pages of either 1K, 4K, 64K, depending. If you allocate fifty ten-byte strings on yourself, they might find themselves in different pages and waste lots of memory. The MM on seeing your first request for 10 bytes will maybe allocate 4096, and then parcel them out to you in 10-byte lots.Now (I’m going out on a limb here!), suppose your application needs to allocate memory equal to a whole page, in ten chunks. Your initial heap allocation, due to the fact that the program itself needs a small overhead, is a quarter page.
So you go ahead and allocate your ten chunks. The first six fit in the partially empty page zero; the next four request a new allocation of a new page, page one.
The allocation done, you start juggling data to and fro your chunks, without ever being aware that they reside in two different pages. Depending on the OS, compiler, optimization and weather forecasts, this might mean that those operations incur an overhead.
Now let’s suppose that you allocate, and leak, three quarters of a page. Then when you allocate your first chunk, it won’t fit in page zero, and the first – and the remaining nine – chunks all go, and fit, in page one. Should -O3 optimization exploit same-page data access, you will experience a compiler dependent performance gain.
Please keep in mind that this is only an ad hoc hypothesis. It looks plausible to me, but that’s not really a guarantee of anything 🙂
More details on libc standard memory management (others exist) here
You might also check out Google’s tools for C++.