HomeReseach Talks ➤ 033 20 10 2022

Multilingual Embedding Alignment

Kasun Wickramasinghe
Slides Video

Embeddings are the basic ingredient in many kinds of natural language processing tasks. When it comes to multilingual tasks, one of the challenges is that the embedding spaces are not aligned even if the spaces have been shown to have a similar geometric arrangements as Mikolov et al. show in [2], specially when the training process is similar. The alignment is required for two kinds of embedding models. One is the embedding models separately trained on monolingual data and, the other type is multilingual models trained on parallel multilingual data. As far as the multilingual models are concerned, most of the times the training process itself implicitly encourages for the alignment. On the other hand, the monolingual models are concerned, the alignment has to be done explicitly after the models are trained. Even though multilingual embedding models are getting popular nowadays, the monolingual embedding alignment is still vital specially when it comes to low resource languages, when pre-training or fine tuning a multilingual model is time and resource consuming and also when well trained monolingual models are already available.

    Page: /