A comprehensive study on contrastive pre-training and fine tuning of vision and text transformers for video memorability prediction | Publicación