Robust and resilient decentralized training of ML models
Researchers: Cheng Fang, Waheed U. Bajwa
This project is about distributed/decentrazlied machine learning algorithm design and implementation in the presence of adversarial agents.
An increasing amount of attention has been drawn to the area of designing reliable and efficient machine learning algorithms. Due to either design constraints (e.g., multi-agent and Internet-of-Things systems) or computational/privacy reasons (e.g., large-scale machine learning on smartphone data), some of the machine learning applications are typically limited to loss functions that are distributed across multiple computing devices/machines. For those applications, learning tasks usually need to be done in a decentralized fashion, in which a central server directly connecting to all the nodes will not be guaranteed to exist. However, to fulfill the design constraint/satisfy the computational/privacy requirements, especially in real-world decentralized applications, people often need to pay the price of links between nodes being vulnerable to failures due to the quality of communication channels, man-in-the-middle attacks, etc., which are likely to bias the nodes from learning the designated models. The main contribution is to make decentralized optimization robust to such failures in the network.