面向边缘智能的协同训练研究进展

王睿; 王岩; 尹朴; 齐建鹏; 孙叶桃; 李倩; 张易达; 张梅奎

doi:10.13374/j.issn2095-9389.2022.09.26.004

摘要: 随着万物互联时代的快速到来，海量的数据资源在边缘侧产生，使得基于云计算的传统分布式训练面临网络负载大、能耗高、隐私安全等问题。在此背景下，边缘智能应运而生。边缘智能协同训练作为关键环节，在边缘侧辅助或实现机器学习模型的分布式训练，成为边缘智能研究的一大热点。然而，边缘智能需要协调大量的边缘节点进行机器模型的训练，在边缘场景中存在诸多挑战。因此，通过充分调研现有边缘智能协同训练研究基础，从整体架构和核心模块两方面总结现有的关键技术，围绕边缘智能协同训练在设备异构、设备资源受限和网络环境不稳定等边缘场景下进行训练的挑战及解决方案；从边缘智能协同训练的整体架构和核心模块两大方面进行介绍与总结，关注边缘设备之间的交互框架和大量边缘设备协同训练神经网络模型参数更新问题。最后分析和总结了边缘协同训练存在的诸多挑战和未来展望。

Abstract: With the rapid arrival of the Internet of Everything era, massive data resources are generated on edge sides, causing problems such as large network load, high energy consumption, and privacy security in traditional distributed training based on cloud computing. Edge computing sinks computing power resources to the edge side, forming a collaborative computing system that integrates “cloud, edge, and end,” which can meet the basic needs of real-time operations, intelligence, security, and privacy protection. With the help of edge computing capabilities, edge intelligence effectively promotes the intelligent development of the edge side, which has become a popular topic. Through our research, we found that edge collaborative intelligence is currently in a stage of rapid development. At this stage, several deep learning models are combined with edge computing, and many edge collaborative intelligent processing solutions have exploded, such as distributed training in edge computing scenarios, federated learning, and distributed collaborative reasoning based on technologies such as model cutting and early exit. The combination of a shallow breadth learning system and virtualization technology allows for quick implementation of edge intelligence, which considerably improves service quality and user experience and makes services more intelligent. As a key link of edge intelligence, edge intelligence collaborative training aims to assist or implement the distributed training of machine learning models on the edge side. However, in an edge computing scenario, the distributed training of the model must coordinate several edge nodes, and many challenges remain. Therefore, by fully investigating the existing research foundation of edge intelligent collaborative training, we focus on the challenges and solutions of edge intelligent collaborative training in edge scenarios such as equipment heterogeneity, limited equipment resources, and unstable network environments. This paper introduces and summarizes the overall architecture and core modules of edge intelligent collaborative training. The overall architecture mainly focuses on the interaction framework between edge devices. In terms of whether there is a central server role, it can be divided into two categories: parameter server centralized architecture and fully decentralized parallel architecture. The core module of edge intelligent collaborative training mainly focuses on the problem of collaborative training of a large number of edge devices for neural network models to update parameters. In terms of the role of parallel computing in model training, it is divided into data parallelism and model parallelism. Finally, the many challenges and prospects of edge collaborative training are analyzed and summarized.

面向边缘智能的协同训练研究进展

Survey of edge–edge collaborative training for edge intelligence