I know how to generate a .ptx file from a .cu and how to generate a .cubin file from a .ptx. But I don’t know how to get the final executable.
More specifically, I have a sample.cu file, which is compiled to sample.ptx. I then use nvcc to compile sample.ptx to sample.cubin. However, this .cubin file cannot be directly executed without host code. How can I link .cubin file to my original .cu file to produce the final executable?
You should be able to run ptx code directly from the cuda driver api with cuModuleLoadDataEx. There is an example here at page 5