I am creating something similar to CUDA but I saw that copy memory from RAM to VRAM is very fast like copying from RAM to itself. But copy from VRAM to RAM is a way slower than RAM to VRAM.
By the way I am using glTexSubImage2D to copy from RAM to VRAM and glGetTexImage to copy from VRAM to RAM.
Why? Is there a way to improve it’s performance like copying RAM to VRAM?
Transferring data from GPU to CPU was always a very slow operation.
Depending on your graphical chip and driver, maybe you get better performances by using PBOs.