Limitations of Automatic Relevance Assessments with Large Language Models for Fair and Reliable Retrieval Evaluation | Publicación