Performance and Programmability of MPI+X Integration with CUDA, HIP, SYCL, OpenACC, and OpenMP Offloading for Supercomputing: A Case Study on Dense Matrix–Vector Multiplication | Publicación