Programming matrix algorithms-by-blocks for thread-level parallelism | Publicación