A closer look at referring expressions for video object segmentation | Publicación