Cristina España-Bonet is currently the machine translation team lead at the German Research Center for Artificial Intelligence (DFKI GmbH). With a background on physics (degree), artificial intelligence (master) and cosmology (PhD), she has been working on natural language processing at Universitat Politècnica de Catalunya (Barcelona), Saarland University and DFKI (Saarbrücken). Cristina is especially interested in interlingual and multilingual approaches and how these can be used to improve performance in low-resourced settings and for low-resourced languages.
Under this very generic title I will summarize some work in our group related to embeddings, machine translation and evaluation that has been done for languages from Sub-Saharan Africa. I will start by defining a low-resource setting and we will see how, in this case, the (language-dependent) curation of the data is crucial for some tasks. Afterwards I will focus on neural machine translation (NMT) and compare several approaches when only a limited amount of parallel data is available. Using one of the models as example, self-supervised NMT, we will discuss the evaluation of such models to see that, in low-resource settings, not only trainings but also evaluations are a challenge.