Recently I have been using thrust a lot. I have noticed that in order to use thrust, one must always copy the data from the cpu memory to the gpu memory.
Let’s see the following example :
int foo(int *foo)
{
host_vector<int> m(foo, foo+ 100000);
device_vector<int> s = m;
}
I’m not quite sure how the host_vector constructor works, but it seems like I’m copying the initial data, coming from *foo, twice – once to the host_vector when it is initialized, and another time when device_vector is initialized. Is there a better way of copying from cpu to gpu without making an intermediate data copies? I know I can use device_ptras a wrapper, but that still doesn’t fix my problem.
thanks!
One of
device_vector‘s constructors takes a range of elements specified by two iterators. It’s smart enough to understand the raw pointer in your example, so you can construct adevice_vectordirectly and avoid the temporaryhost_vector:If your raw pointer points to CUDA memory, introduce a
device_ptr:Using a
device_ptrdoesn’t allocate any storage; it just encodes the location of the pointer in the type system.