In my project, I have implemented a custom memory allocator to avoid unneccessary calls to cudaMalloc once the application has “warmed up”. Moreover, I use custom kernels for basic array filling, arithmetic operations between arrays, etc. and would like to simplify my code by using Thrust and getting rid of these kernels. Every array on the device is created and accessed through raw pointers (for now) and I’d like to use device_vector and Thrusts methods on these objects, but I find myself converting between raw pointers and device_ptr<> all the time, somewhat cluttering up my code.
My rather vague question: How would/do you organize the usage of custom memory management, Thrusts array methods and calls to custom kernels in the most readable way?
Like all standard c++ containers, you can customize how
thrust::device_vectorallocates storage by providing it with your own “allocator”. By default,thrust::device_vector‘s allocator isthrust::device_malloc_allocator, which allocates (deallocates) storage withcudaMalloc(cudaFree) when Thrust’s backend system is CUDA.Occasionally, it is desirable to customize the way
device_vectorallocates memory, such as in the OP’s case, who would like to sub-allocate storage within a single large allocation performed at program initialization. This can avoid overhead which may be incurred by many individual calls to the underlying allocation scheme, in this case,cudaMalloc.A simple way to provide
device_vectora custom allocator is to inherit fromdevice_malloc_allocator. One could in principle author an entire allocator from scratch, but with an inheritance approach, only theallocateanddeallocatemember functions need to be provided. Once the custom allocator is defined, it can be provided todevice_vectoras its second template parameter.This example code demonstrates how to provide a custom allocator which prints a message upon allocation and deallocation:
Here’s the output:
In this example, note that we hear from
my_allocator::allocate()once uponvec.resize(10,13).my_allocator::deallocate()is invoked once whenvecgoes out of scope as it destroys its elements.