Multi-Domain Neural Machine Translation with Knowledge Distillation for Low-Resource Languages (Defence)
Multi-domain adaptation in Neural Machine Translation (NMT) is crucial for ensuringhigh-quality translations across diverse domains. Traditional fine-tuning approaches,while effective, become impractical as the number of domains increases, leading tohigh computational costs, space complexity, and catastrophic forgetting. KnowledgeDistillation (KD) offers a scalable alternative by training a compact model using dis-tilled data from a larger teacher model. However, we hypothesize that sequence-levelKD primarily distills the decoder while neglecting encoder knowledge transfer, result-ing in suboptimal adaptation and generalization, particularly in low-resource languagesettings where both data and computational resources are constrained.To address this, we propose an improved sequence-level distillation frameworkenhanced with encoder alignment using cosine similarity-based loss. Our approachensures that the student model captures both encoder and decoder knowledge, miti-gating the limitations of conventional KD. We evaluate our method on multi-domainGerman–English translation under simulated low-resource conditions and further ex-tend the evaluation to a bona fide low-resource language, demonstrating the method’srobustness across diverse data conditions.Results demonstrate that our proposed encoder-aligned student model can evenoutperform its larger teacher models, achieving strong generalization across domains.Additionally, our method enables efficient domain adaptation when fine-tuned on newdomains, surpassing existing KD-based approaches. These findings establish encoderalignment as a crucial component for effective knowledge transfer in multi-domainNMT, with significant implications for scalable and resource-efficient domain adapta-tion.