HomePublications ➤ kumarasinghe2025automatic

Automatic Generation of Research Paper Abstracts using Deep-Hybrid Models

Dushan Kumarasinghe
Advised by: Nisansa de Silva
University of Moratuwa

Condensing important information into a summary is crucial for readers navigating lengthy documents. In the context of research papers, the abstract serves as a concise overview of the study. This thesis focuses on enhancing research paper summarization by introducing a novel section-wise relevance matrix. To address the token size limitations of Large Language Models (LLMs) , such as GPT-Neo, we developed a two-fold approach. First, we employed extractive summarization to condense lengthy texts into key sentences, followed by the application of abstractive summarization to generate coherent and concise summaries from these extracts. Our approach, combining both extractive and abstractive techniques, leverages section-wise involvement ratios, with particular attention to the abstract section, improving the accuracy and quality of generated summaries. We introduced a pioneering dataset of research papers organized into sections, which plays a crucial role in this summarization process. Experimental results demonstrated that our method produces high-quality summaries while effectively overcoming token limitations, offering significant potential for summarizing long documents in low-resource and cost-effective environments. However, challenges arise when section-wise segmentation is unclear, impacting the accuracy of summaries. This research underscores the need for further refinements and offers a promising framework for enhancing summarization techniques, benefiting researchers, educators, and information seekers alike.

Keywords: Natural Language Processing | Machine Learning / Deep Learning | Text Summarization | Text Generation | LLM |