I am a beginner with OpenCL and I have difficulties to understand something.
I want to improve the transfers of an image between host and device.
I made a scheme to better understand me.

Top: what I have now | Bottom: what I want
HtD (Host to Device) and DtH ( Device to Host) are memory transfers.
K1 and K2 are kernels.
I thought about using mapping memory, but the first transfer (Host to Device) is done with the clSetKernelArg() command, no ?
Or do I have to cut my input image into sub-image and use mapping to get the output image ?
Thanks.
Edit: More information
K1 process mem input image.
K2 process output image from K1.
So, I want to transfer MemInput into several pieces for K1.
And I want to read and save on the host the MemOuput processed by K2.
As you may have already seen, you do a transfer from host to device by using
clEnqueueWriteBufferand similar.All the commands having the keyword ‘enqueue’ in them have a special property: The commands are not executed directly, but when you tigger them using
clFinish,clFlush,clEnqueueWaitForEvents, usingclEnqueueWriteBufferin blocking mode and some more.This means that all action happens at once and you have to synchronise it using the event objects. As everything (may) happen at once, you could do something like this (Each point happens at the same time):
Remember: Enqueueing Tasks without Event-Objects may result in a simultaneous execution of all enqueued elements!
To make sure that Process Data B doesn’t happen before Transfer B, you have to retrieve an event object from
clEnqueueWriteBufferand supply it as an object to wait for to f.i.clEnqueueNDRangeKernelInstead of supplying NULL, each command can of course wait on certain objects AND generate a new event object. The parameter next to last is an array, so you can event wait for several events!
EDIT: To summarise the comments below
Transferring data – What command acts where?
CPU GPU BufA BufB array[] = {...} clCreateBuffer() -----> [ ] //Create (empty) Buffer in GPU memory * clCreateBuffer() -----> [ ] [ ] //Create (empty) Buffer in GPU memory * clWriteBuffer() -arr-> [array] [ ] //Copy from CPU to GPU clCopyBuffer() [array] -> [array] //Copy from GPU to GPU clReadBuffer() <-arr- [array] [array] //Copy from GPU to CPU* You may initialise the buffer directly by providing data using the
host_ptrparameter.