I’m doing a project concerning some CUDA acceleration on GPU and finished some software level optimization my calculations, and also I’ve found out that some changes in GPU architecture may help the optimization of the project even further.
Now my question, is there an efficient way or existed emulator to let me change some features or parts in GPU architecture, and then I can benchmark the CUDA PTX code on that custom-built architecture to get the performance results(better be cycle accurate)? There are several architecture simulators of CPUs, so I was wondering if some of them may support GPU?
Or I may have to write a GPU emulator myself^_^?
You are exactly looking for GPGPU-sim. The simulator accurately models NVIDIA GPUs and executes OpenCL and CUDA workloads without modification in the code. I guess there are options to model PTX workloads too. From manual:
GPGPU-sim is widely configurable letting you model different micro-architectures. For example, you can adjust the number of SMs, warp schedulers, SIMD groups, thread per SM, shared memory size, register file size, and many other parameters explained in the manual. At the end of simulation, the simulator dump execution duration (in GPU clock cycles) and many other performance counters.
Further information:
GPU Ocelot is a PTX analyzer. In previous versions, you could simulate a workload. However, the recent version focuses on compiler optimization of the PTX codes.
MacSim is another complex yet powerful tool, simulating the heterogeneous system of CPU and GPU.