HomeProjects

Low-Resource Adaptation for NLP

Principal Investigator: Nisansa de Silva

We propose to advance natural language processing for low-resource settings by leveraging data augmentation, transfer learning, and model adaptation techniques to overcome the scarcity of annotated data and linguistic resources.

Low-resource natural language processing (NLP) remains a key challenge in enabling equitable access to language technologies across diverse linguistic communities. The lack of high-quality annotated data and robust resources often limits the applicability of modern NLP techniques to many languages and tasks. This project seeks to address these challenges by systematically developing methods for data creation, augmentation, and model adaptation.
We explore strategies such as data augmentation, denoising, and debiasing of parallel corpora to improve training material. Cross-lingual lexicon induction, embedding alignment, and resource development are employed to establish strong linguistic foundations. On the modeling side, we focus on knowledge distillation, adapter-based fine-tuning, and the efficient adaptation of large pre-trained models to perform in low-resource contexts.
Additionally, we emphasize building reproducible datasets and benchmarks, supporting multi-domain learning, and evaluating comparative approaches across tasks such as sentiment analysis, summarization, classification, and generation. We also investigate zero-shot and few-shot settings, extending the benefits of large-scale multilingual encoders and large language models to under-represented languages.
Through this integrated approach, the project contributes towards reducing the performance gap between resource-rich and low-resource NLP, ensuring broader accessibility and inclusivity in language technologies.

Objectives:

  • Investigate data augmentation and denoising strategies to enhance the quality of training data for low-resource NLP tasks.
  • Develop cross-lingual lexicon induction, embedding alignment, and resource creation methods to bridge gaps between resource-rich and low-resource languages.
  • Explore multi-domain and multi-task approaches that improve model generalization in low-resource contexts.
  • Apply transfer learning, knowledge distillation, and adapter-based fine-tuning to adapt large pre-trained models efficiently.
  • Build datasets, benchmarks, and evaluation frameworks to support reproducibility and scalability in low-resource NLP research.
  • Examine zero-shot and few-shot methods to extend model capabilities to languages or tasks with minimal supervision.
  • Advance sentiment analysis, summarization, and generation methods that can perform effectively despite limited resources.


Keywords: Natural Language Processing | Machine Learning / Deep Learning | Sinhala | Big Data | Ontologies | LLM | Low-resource Languages | Sentiment Analysis | Aspect-based Sentiment Analysis | Social Media | Multilingual | Multi-document summarization | Mobile App Review Analysis | Software Evolution | Word Embeddings | Word Vectorization | Fine-tuning | Word Embedding Alignment | BLI | Alignment Dictionaries | WordNet | Language-agnostic Processing | Textual Reviews | Multilingual Embedding | Inflected Languages | Measure Alignment | Corpus | Aspect Extraction | DeBERTa | Text Classification | GPT | InstructABSA | Text Generation |




Publications

Dissertations

Theses: MSc Major Component Research

Theses: MSc Minor Component Research

Journal Papers

Conference Papers

Workshop Papers

Extended Abstracts

White Papers

Preprints

Team

External Collaborators: | G G N Sandamali | K L K Sudheera | Yudhanjaya Wijeratne | Surangika Ranathunga | Mokanarangan Thayaparan | Rishemjit Kaur | C D Athuraliya | Chinthana Wimalasuriya | Gihan Dias | Shehan Perera | Stephen Cranefield | Bastin Tony Roy Savarimuthu |


Faculty

Nisansa de Silva

Senior Lecturer
University of Moratuwa

MSc Students

Charitha Rathnayake

Lecture on Contract
University of Moratuwa

Nevidu Jayatilleke

Research Assistant (Assistant Lecturer Grade)
Informatics Institute of Technology

Vishal Thenuwara

Software Engineer
Amused Group

Yomal De Mel

Manager Finance
MAS Active

Undergraduates

Imalsha Puranegedara

Student
University of Moratuwa

Kavindu Warnakulasuriya

Student
University of Moratuwa

Navindu De Silva

Student
University of Moratuwa

Nisal Ranathunga

Student
University of Moratuwa

Prabhash Dissanayake

Student
University of Moratuwa

Rashad Sirajudeen

Student
University of Moratuwa

Samith Karunathilake

Software Engineer
WSO2

Themira Chathumina

Student
University of Moratuwa

Alumni-PhD Students

Aloka Fernando

Researcher / Visiting Lecturer
Informatics Institute of Technology

Alumni-MSc Students

Kasun Wickramasinghe

AI Research Engineer
Analog Inference

Kushan Hewapathirana

Machine Learning Engineer
ConscientAI

Pubudu Cooray

Lead Software Engineer
Insighture

Sadeep Gunathilaka

Software Engineer
Inexis Consulting

    Velayuthan Menan

    AI Research Engineer
    University of Moratuwa

    Alumni-Undergraduates

    Amanda Malkith

    Software Engineer
    Cut+Dry

    Anushka Mahesh

    Senior Fullstack Engineer
    Healthcare Clarity

    Aravinda Kankanamge

    Software Engineer Fellow
    Lanka Software Foundation

    Buddhika Gunathilaka

    Software Engineer
    Harlem Next

    Dilith Jayakody

    Graduate Student
    Dalhousie University

    Dimuthu Upeksha

    Director of Engineering
    Folia

    Dineth Jayakody

    Ph.D. Student
    Old Dominion University

    Dulanga Sashika

    Senior Consultant
    Visa

    Eranda Karannagoda

    Software Engineer
    Huubap PTE Ltd

    Gihan Weeraprameshwara

    Ph.D. Student
    Michigan State University

    Indeewari Wijesiri

    Associate Technical Lead
    WSO2

    Jayaprabath Fernando

    R&D Engineer
    Syntax Genie (Pvt) Ltd

    Koshila Isuranda

    Software Engineer
    Emojot

    Lahiru Lasandun

    Senior Technical Lead
    SenzMate

    Madhuranga Lakjeewa

    Software Engineer
    Automic Group

    Maduranga Siriwardena

    Associate Director / Architect
    WSO2

    Malaka Gallage

    Senior Full Stack Developer
    Hitachi Energy

    Nadeeshaan Gunasinghe

    Expert Software Engineer
    Zühlke Group

    Sachintha Rajith

    Co-founder and Chief Technology Officer
    Emojot

    Thilina Chathuranga

    Principal Software Engineer
    Fleetwise New Zealand

    Thilina Premasiri

    Senior Technical Artist
    Skybox Labs

    Vihanga Jayawickrama

    Lecturer (on Contract)
    University of Moratuwa

    Grants

    2022
    08
    -
    2023
    08
    Multi-domain Neural Machine Translation (NMT) System for Sinhala, Tamil, and English
    $35,000 - Google/2022
    We propose to create a multi-domain Neural Machine Translation (NMT) System for Sinhala, Tamil, and English, the official languages of Sri Lanka.