HomeProjects

Social Media Text Analysis

Principal Investigator: Nisansa de Silva

We propose to advance social media text analysis by combining linguistic study, preprocessing, and deep learning models to better capture sentiment, structure, and meaning in informal, large-scale user-generated text.

Social media platforms produce vast amounts of user-generated text that is often informal, noisy, and linguistically diverse. Analyzing this text presents unique challenges for natural language processing, including spelling variations, non-standard grammar, and rapid shifts in language use. This project seeks to address these challenges by developing methods and resources for effective social media text analysis.
Our research investigates the linguistic properties of social media text and applies normalization techniques to reduce noise and improve downstream model performance. Sentiment analysis and reaction prediction methods are explored, ranging from baseline classifiers to advanced deep learning approaches, enabling fine-grained understanding of user opinions. Embedding-based techniques are employed to represent short and informal text effectively, with attention to the needs of low-resource settings.
We also emphasize dataset creation and large-scale corpus studies, including temporal analyses of user-generated content, to provide insights into evolving linguistic and sentiment patterns. By integrating linguistic analysis, preprocessing, and modern modeling approaches, this work contributes to more robust, accurate, and scalable systems for social media text analysis.

Objectives:

  • Investigate linguistic properties and stylistic patterns in social media text to better understand its unique characteristics.
  • Develop preprocessing and normalization techniques tailored for noisy, user-generated content.
  • Explore sentiment analysis and reaction prediction models to capture opinions and attitudes expressed in social media.
  • Design and evaluate embedding-based approaches for representing short, informal, and multilingual text.
  • Build datasets and benchmarks derived from large-scale social media corpora to facilitate reproducible research.
  • Examine temporal and large-scale patterns in social media data to support longitudinal linguistic and sentiment studies.


Keywords: Natural Language Processing | Sinhala | Big Data | Machine Learning / Deep Learning |




Publications

Journal Papers

Conference Papers

White Papers

Preprints

Team

External Collaborators: | Yudhanjaya Wijeratne | Upali Kohomban | Danaja Maldeniya | Chamilka Wijeratne |


Faculty

Nisansa de Silva

Senior Lecturer
University of Moratuwa

MSc Students

Yomal De Mel

Manager Finance
MAS Active

Alumni-Undergraduates

Chiran Gamage

Technical Lead
GTN Tech

Eranga Mapa

Lead Engineer
Bit Game Labs

Gihan Weeraprameshwara

Ph.D. Student
Michigan State University

Lasitha Wattaladeniya

Staff Software Engineer
Capital One

Samith Dassanayake

Engineering Manager
Klarna

Vihanga Jayawickrama

Lecturer (on Contract)
University of Moratuwa