Automated Vulnerability Repair for Machine Learning Systems
In today’s rapidly advancing technological landscape, Machine Learning(ML) systems have become integral to various applications, from autonomous vehicles to healthcare diagnostics. However, with the increasing adoption of ML frameworks like PyTorch and TensorFlow, there has been a growing concern about the security vulnerabilities that these systems may harbor. If not addressed, these vulnerabilities can propagate across dependent systems, leading to significant risks. This project aims to address these challenges by developing a comprehensive ML system security backport dataset and an automated patch backporting tool. The proposed solution leverages a Retrieval Augmented Generation architecture and Large Language Models to automate backporting security patches from the mainline version to stable versions of ML systems. By incorporating domain-specific knowledge, the model is designed to enhance the accuracy and efficiency of vulnerability mitigation. The expected outcomes of this project include a robust dataset that fills the current gaps in ML system vulnerability resources and a scalable, automated solution that aids developers in promptly addressing security vulnerabilities. This work has the potential to significantly improve the security and reliability of ML frameworks significantly, ensuring safer deployment of these technologies in critical applications.