Professional Experience

  • Present 2020

    Senior Lecturer

    Department of Computer science & Engineering, University of Moratuwa,
    Sri Lanka

  • 2021 2020

    Research Fellow

    LIRNEasia,
    Sri Lanka

  • 2020 2014

    Graduate Research/Teaching Fellow

    University of Oregon, Department of Computer and Information Science,
    USA.

  • 2018 2018

    Givens Associate

    Argonne National Laboratory,
    USA.

  • 2020 2011

    Lecturer

    Department of Computer science & Engineering, University of Moratuwa,
    Sri Lanka

  • 2014 2013

    Researcher

    LIRNEasia,
    Sri Lanka

  • 2014 2013

    Visiting Lecturer

    Northshore College of Business and Technology,
    Sri Lanka

Education

  • Ph.D. 2020

    Ph.D. in Computer & Information Science

    University of Oregon, USA

  • MS 2016

    MS in Computer & Information Science

    University of Oregon, USA

  • BSc2011

    B.Sc Engineering (Hons)in Computer Science & Engineering

    University of Moratuwa, Sri Lanka

Featured Research

Learning Sentence Embeddings in the Legal Domain with Low Resource Settings


S. Jayasinghe, L. Rambukkanage, A. Silva, N. de Silva, S. Perera, and M. Perera

Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation, 2022, pp. 494--502,

As Natural Language Processing is evolving rapidly, it is used to analyze domain specific large text corpora. Applying Natural Language Processing in a domain with uncommon vocabulary and unique semantics requires techniques specifically designed for that domain. The legal domain is such an area with unique vocabulary and semantic interpretations. In this paper we have conducted research to develop sentence embeddings, specifically for the legal domain, to address the domain needs. We have carried this research under two approaches. Due to the availability of a large corpus of raw court case documents, an Auto-Encoder model which re-constructs the input sentence is trained in a self-supervised approach. Pretrained word embeddings on general corpora and word embeddings specifically trained on legal corpora are also incorporated within the Auto-Encoder. As the next approach we have designed a multitask model with noise discrimination and Semantic Textual Similarity tasks. It is expected that these embeddings and gained insights would help vectorize legal domain corpora, enabling further application of Machine Learning in the legal domain.