Multimodal transformers for image and audio polyphonic music transcription | Publicación