Multi-armed bandits for resource efficient, online optimization of language model pre-training: the use case of dynamic masking | Publicación