Home

Who We Are

Nisansa de Silva’s Research Group conducts research in natural language processing with a special focus on low-resource languages such as Sinhala and Tamil, data quality in multilingual corpora, and applied machine learning, based at the Department of Computer Science & Engineering, University of Moratuwa. The group was originally established in 2011, and went on semi-hiatus while Dr. de Silva was at the University of Oregon (2014–2020). It was formally re-established in 2021 upon his return to UoM.

Faculty and students explore multiple aspects of language technology, including lexical semantics, sentiment and subjectivity analysis, graph-based NLP, and multilingual text processing. Their work has appeared at major international venues such as ACL, EACL, WMT, and LoResMT, with ongoing collaborations bridging local linguistic challenges and global research frontiers.

The group has received research funding from Google and the University of Moratuwa, while also making an impact through the release of open-source tools, surveys, and models to support the wider Sinhala and Tamil NLP communities.

Research Areas at a Glance

In recent years, the group has produced notable work on:

  • Sinhala NLP for low-resource settings – developing transliteration systems, improving neural machine translation with encoder-aware knowledge distillation, and maintaining a living survey of Sinhala NLP resources.
  • Data quality and openness in NLP research – auditing web-mined parallel corpora for Sinhala–English–Tamil, highlighting the critical role of quality filtering, and analyzing openness trends in global NLP research.
  • Efficient pipelines for constrained translation tasks – designing statistical filtering approaches that enable competitive machine translation performance with limited resources.