Toward Scalable Matrix Multiply on Multithreaded Architectures | Publicación