Using Fermi Architecture Knowledge to Speed up CUDA and OpenCL Programs | Publicación