Home ➤ Publications ➤ upeksha2015sinmin
Sinmin - Sinhala Corpus Project
Sinmin - Sinhala Corpus Project 
Dimuthu Upeksha, Chamila Wijayarathna, Maduranga Siriwardena, Lahiru Lasandun
Advised by: Daya Chinthana Wimalasuriya, Gihan Dias, Nisansa de Silva
University of Moratuwa
SinMin is a corpus for Sinhala language which is
Advised by: Daya Chinthana Wimalasuriya, Gihan Dias, Nisansa de Silva
University of Moratuwa
Today, the corpus based approach can be identified as the state of the art methodology in language learning studying for both prominent and less known languages in the world. The corpus based approach mines new knowledge on a language by answering two main questions:
- What particular patterns are associated with lexical or grammatical features of the language?
- How do these patterns differ within varieties and registers?
SinMin is a corpus for Sinhala language which is
- Continuously updating
- Dynamic (Scalable)
- Covers wide range of language (Structured and unstructured)
- Providing a better interface for users to interact with the corpus
Keywords: Natural Language Processing | Sinhala | Big Data |