Aligned Embeddings or Multilingual Embeddings: A Comprehensive Study in Word Embeddings Paradigm
Multilingual processing is becoming more common in most practical use cases in the present day due to the usage of code-mixed languages, and the need to process multilingual documents in a language-agnostic manner. That removes the difficulty layer of aligning the monolingual embeddings but, is it worth enough to abandon monolingual model alignment just because of that particular easiness factor? Are the multilingual models better than aligned monolingual models in every aspect? or should we stick with aligned monolingual embeddings omitting the computation-heavy multilingual models? Here we evaluate how well traditional embedding alignment techniques and novel multilingual models perform on bilingual lexicon induction (BLI) tasks in the context of high-resource and low-resource languages. Further, we investigate the impact of the language families to which the pairs of languages belong. We find that the aligned-monolingual models outperform the multilingual models in certain cases and vice versa. Therefore both types of embeddings are still vital to be used. In addition to that we propose a novel stem-based BLI technique to evaluate two aligned embedding spaces where this technique can be more effective than word-based BLI for inflected languages.