Establishing vocabulary tests as a benchmark for evaluating large language models | Publicación