I currently have a large array of floats that I process in my OpenCL kernel i am wondering if i divide this array up and use an OpenCL vector type array instead, if it will speed up the process. Basically if i had an array of 4,800 floats i would divide it up into an array of 300 float16 vectors. Would this take advantage of SIMD?
Share
Intel actually describes what their OpenCL SDK does: see Writing Optimal OpenCL™ Code with Intel® OpenCL SDK. You might want to check that out, as an addition to benchmarking. The interesting part starts at chapter 2.3.
To answer your question: yes, it will take advantage of SIMD. But to “maximize utilization of the CPU vector units by using vector data types” you should really read that document.