Sinhala Text Classification: Observations from the Perspective of a Resource Poor Language 
Sinhala, despite its several millennia long history, remains a resource poor language. The objective of this study was to explore the possibility of enhancing the text classification process of a resource poor language by means of data and tools from a resource rich language. However, it was discovered that if the feature space is based on an n-gram model, Sinhala, being a a highly inflected language, naturally performs better than English, which is a weakly inflected language. This result held true even when Sinhala was only utilizing the basic lexical level models and English was utilizing advanced semantic level models.
Keywords: Natural Language Processing | Machine Learning / Deep Learning | Ontologies | Big Data | Sinhala | Text Classification | Resource Poor Language | Resource Rich Language | Low-resource Languages | High-resource Languages |