Unsupervised Mixed-Language Multi-document Summarisation
Multi-document summarization (MDS) is a challenging task that involves producing a concise and coherent summary from information scattered across multiple documents. Traditional MDS systems often assume monolingual inputs, limiting their effectiveness in real-world scenarios where relevant information may span multiple languages. This study presents an extension to the GLIMMER model for unsupervised multi-document summarization, enhancing its capabilities to process mixed-language document clusters. Specifically, we augment the sentence similarity graph with cross-lingual lexical edges, allowing the model to capture semantic relationships between sentences written in different languages. This is achieved without the need for supervised training or language-specific fine-tuning, leveraging external lexical resources such as multilingual embeddings and dictionaries. We evaluate our approach on the MLMD-news dataset, which comprises document clusters in English, German, French, and Spanish. Experimental results demonstrate that our extended model generates coherent and informative summaries that effectively synthesize content across languages, paving the way for practical applications of unsupervised MDS in multilingual settings.