Starting from GK110 (Tesla Kepler), “const restrict” annotation on kernel argument has an extra GPU-specific meaning: accesses to that argument should go through the texture cache. As an example, we use GPU bilinear interpolation, which is a compute-bound problem. Replacing of “RGBApixel* pixels” by “const RGBApixel* restrict pixels” in global and device functions instructs the compiler to emit LDG instructions for pixels loading instead of generic LD:

alt text

$ cuobjdump -sass no_restrict | grep LD
        /*01d8*/                   LD.E R13, [R8+0x4];
        /*0210*/                   LD.E R14, [R8];
        /*0248*/                   LD.E R8, [R4+0x4];
        /*0290*/                   LD.E R11, [R4];
$ cuobjdump -sass restrict | grep LD
        /*01a0*/                   LDG.E R7, [R16];
        /*01b0*/                   LDG.E R9, [R8];
        /*01e0*/                   LDG.E R8, [R16];
        /*01f0*/                   LDG.E R5, [R14];

If restrict pointer does not produce LDG, then the keyword is not applicable or is used incorrectly.

Code version with “const restrict” demonstrates extra speedup of 8%:

$ ./no_restrict hst_lagoon_detail.bmp
Image: BMP 4778 x 4856
GPU kernel time = 0.013719 sec
$ ./restrict hst_lagoon_detail.bmp
Image: BMP 4778 x 4856
GPU kernel time = 0.012686 sec

Download our demo code here.