I have a Visual Studio 2008 C++ application where I’m using a custom allocator for standard containers such that their memory comes from a Memory Mapped File rather than the heap. This allocator is used for 4 different use cases:
- 104-byte fixed size structure
std::vector< SomeType, MyAllocator< SomeType > > foo; - 200-byte fixed size structure
- 304-byte fixed size structure
- n-byte strings
std::basic_string< char, std::char_traits< char >, MyAllocator< char > > strn;
I need to be able to allocate roughly 32MB total for each of these.
The allocator tracks memory usage using a std::map of pointers to allocation size. typedef std::map< void*, size_t > SuperBlock; Each SuperBlock represents 4MB of memory.
There is a std::vector< SuperBlock > of these in case one SuperBlock isn’t enough space.
The algorithm used for the allocator goes like this:
- For each SuperBlock: Is there space at the end of the SuperBlock? put the allocation there. (fast)
- If not, search within each SuperBlock for an empty space of sufficient size and put the allocation there. (slow)
- Still nothing? allocate another SuperBlock and put the allocation at the start of the new SuperBlock.
Unfortunately, step 2 can become VERY slow after a while. As copies of objects are made and temporary variables destroyed I get a lot of fragmentation. This causes a lot of deep searching within the memory structure. Fragmentation is in issue as I have a limited amount of memory to work with (see note below)
Can anybody suggest improvements to this algorithm that would speed up the process? Do I need two separate algorithms (1 for the fixed-size allocations and one for the string allocator)?
Note: For those that need a reason: I’m using this algorithm in Windows Mobile where there’s a 32MB process slot limit to the Heap. So, the usual std::allocator won’t cut it. I need to put the allocations in the 1GB Large Memory Area to have enough space and that’s what this does.
For the fixed sized objects, you can create a fixed sized allocator. Basically you allocate blocks, partition into subblocks of the appropriate size and create a linked list with the result. Allocating from such a block is O(1) if there is memory available (just remove the first element from the list and return a pointer to it) as is deallocation (add the block to the free list). During allocation, if the list is empty, grab a new superblock, partition and add all blocks into the list.
For the variable sized list, you can simplify it to the fixed size block by allocating only blocks of known sizes: 32 bytes, 64 bytes, 128 bytes, 512 bytes. You will have to analyze the memory usage to come up with the different buckets so that you don’t waste too much memory. For large objects, you can go back to a dynamic size allocation pattern, that will be slow, but hopefully the amount of large objects is limited.