Systematic testing of three Language Models reveals low language accuracy, absence of response stability, and a yes-response bias | Publicación