Is there any way on CUDA 2.0 devices to disable L1 cache only for one specific variable?
I know that one can disable L1 cache at compile time adding the flag -Xptxas -dlcm=cg to nvcc for all memory operations.
However, I want to disable cache only for memory reads upon a specific global variable so that all of the rest of the memory reads to go through the L1 cache.
Based on a search I have done in the web, a possible solution is through PTX assembly code.
As mentioned above you can use inline PTX, here is an example:
You can easily vary this by swapping .f64 for .f32 (float) or .s32 (int) etc., the constraint of return_value “=d” for “=f” (float) or “=r” (int) etc. Note that the last constraint before (addr) – “l” – denotes 64 bit addressing, if you are using 32 bit addressing, it should be “r”.