I’m making an application that is going to be using many dynamically created objects (raytracing). Instead of just using [new] over and over again, I thought I’d just make a simple memory system to speed things up. Its very simple at this point, as I don’t need much.
My question is: when I run this test application, using my memory manager uses the correct amount of memory. But when I run the same loop using [new], it uses 2.5 to 3 times more memory. Is there just something I’m not seeing here, or does [new] incur a huge overhead?
I am using VS 2010 on Win7. Also I’m just using the Task Manager to view the process memory usage.
template<typename CLASS_TYPE>
class MemFact
{
public:
int m_obj_size; //size of the incoming object
int m_num_objs; //number of instances
char* m_mem; //memory block
MemFact(int num) : m_num_objs(num)
{
CLASS_TYPE t;
m_obj_size = sizeof(t);
m_mem = new char[m_obj_size * m_num_objs);
}
CLASS_TYPE* getInstance(int ID)
{
if( ID >= m_num_objs) return 0;
return (CLASS_TYPE*)(m_mem + (ID * m_obj_size));
}
void release() { delete m_mem; m_mem = 0; }
};
/*---------------------------------------------------*/
class test_class
{
float a,b,c,d,e,f,g,h,i,j; //10 floats
};
/*---------------------------------------------------*/
int main()
{
int num = 10 000 000; //10 M items
// at this point we are using 400K memory
MemFact<test_class> mem_fact(num);
// now we're using 382MB memory
for(int i = 0; i < num; i++)
test_class* new_test = mem_fact.getInstance(i);
mem_fact.release();
// back down to 400K
for(int i = 0; i < num; i++)
test_class* new_test = new test_class();
// now we are up to 972MB memory
}
There is a minimum size for a memory allocation, depending on the CRT you are using. Often that’s 16 bytes. Your object is 12 bytes wide (assuming x86), so you’re probably wasting at least 4 bytes per allocation right there. The memory manager also has it’s own structures to keep track of what memory is free and what memory is not — that’s not free. Your memory manager is probably much simplier (e.g. frees all those objects in one go) which is inherently going to be more efficient than what new does for the general case.
Also keep in mind that if you’re building in debug mode, the debugging allocator will pad both sides of the returned allocation with canaries in an attempt to detect undefined behavior. That’ll probably put you over the 16 byte boundary and into the next one — probably a 32 byte allocation, at least. That’ll be disabled when you build in release mode.