Neither more nor less: optimizing thread-level parallelism for GPGPUs | Publicación