Towards Multi-document Summarisation in Low-resource Settings (Defence)
The field of multi-document summarisation (MDS) has emerged as a critical area of research within Natural Language Processing (NLP), driven by the increasing need to process large volumes of unstructured textual data. This thesis explores innovative methodologies to address the challenges associated with MDS, particularly in low-resource and multilingual settings. Key contributions include the development of the Multilingual Dataset for Multi-document Summarisation (M2DS), benchmarking state-of-the-art models across diverse datasets, and introducing adapter-based techniques for MDS models, particularly for opinion summarisation and domain adaptation in domain-specific applications such as medical and academic paper summarisation. The study evaluates models like PRIMERA, PEGASUS, and LED on datasets spanning news, academic literature, and customer reviews, highlighting the importance of domain-specific adaptations and novel pretraining strategies. Fine-tuning PRIMERA for medical research demonstrated substantial improvements in summarisation quality, while adapter-based approaches enhanced performance on sentiment- rich datasets. Despite significant progress, challenges such as dataset diversity, scalability, and cross-domain generalization remain. Future directions include expanding multilingual datasets, optimizing parameter-efficient models, and integrating sentiment-aware evaluation metrics. This thesis paves the way for MDS, offering inclusive and effective summarisation solutions applicable across multiple domains and languages.