-
摘要: 随着万物互联时代的快速到来,海量的数据资源在边缘侧产生,使得基于云计算的传统分布式训练面临网络负载大、能耗高、隐私安全等问题。在此背景下,边缘智能应运而生。边缘智能协同训练作为关键环节,在边缘侧辅助或实现机器学习模型的分布式训练,成为边缘智能研究的一大热点。然而,边缘智能需要协调大量的边缘节点进行机器模型的训练,在边缘场景中存在诸多挑战。因此,通过充分调研现有边缘智能协同训练研究基础,从整体架构和核心模块两方面总结现有的关键技术,围绕边缘智能协同训练在设备异构、设备资源受限和网络环境不稳定等边缘场景下进行训练的挑战及解决方案;从边缘智能协同训练的整体架构和核心模块两大方面进行介绍与总结,关注边缘设备之间的交互框架和大量边缘设备协同训练神经网络模型参数更新问题。最后分析和总结了边缘协同训练存在的诸多挑战和未来展望。Abstract: With the rapid arrival of the Internet of Everything era, massive data resources are generated on edge sides, causing problems such as large network load, high energy consumption, and privacy security in traditional distributed training based on cloud computing. Edge computing sinks computing power resources to the edge side, forming a collaborative computing system that integrates “cloud, edge, and end,” which can meet the basic needs of real-time operations, intelligence, security, and privacy protection. With the help of edge computing capabilities, edge intelligence effectively promotes the intelligent development of the edge side, which has become a popular topic. Through our research, we found that edge collaborative intelligence is currently in a stage of rapid development. At this stage, several deep learning models are combined with edge computing, and many edge collaborative intelligent processing solutions have exploded, such as distributed training in edge computing scenarios, federated learning, and distributed collaborative reasoning based on technologies such as model cutting and early exit. The combination of a shallow breadth learning system and virtualization technology allows for quick implementation of edge intelligence, which considerably improves service quality and user experience and makes services more intelligent. As a key link of edge intelligence, edge intelligence collaborative training aims to assist or implement the distributed training of machine learning models on the edge side. However, in an edge computing scenario, the distributed training of the model must coordinate several edge nodes, and many challenges remain. Therefore, by fully investigating the existing research foundation of edge intelligent collaborative training, we focus on the challenges and solutions of edge intelligent collaborative training in edge scenarios such as equipment heterogeneity, limited equipment resources, and unstable network environments. This paper introduces and summarizes the overall architecture and core modules of edge intelligent collaborative training. The overall architecture mainly focuses on the interaction framework between edge devices. In terms of whether there is a central server role, it can be divided into two categories: parameter server centralized architecture and fully decentralized parallel architecture. The core module of edge intelligent collaborative training mainly focuses on the problem of collaborative training of a large number of edge devices for neural network models to update parameters. In terms of the role of parallel computing in model training, it is divided into data parallelism and model parallelism. Finally, the many challenges and prospects of edge collaborative training are analyzed and summarized.
-
Key words:
- cloud computing /
- edge intelligence /
- collaborative training /
- edge computing /
- machine learning /
- distributed training
-
表 1 参数服务器集中式架构相关工作
Table 1. Related works of a parameter server with centralized architecture
Communication mechanism Optimization level Research questions Optimization objective Reference Synchronization Equipment level Limited resources Improve local model quality [41] Communication level Limited resources Reduce traffic [56] Equipment level Heterogeneous equipment Shorten communication time [58–59] Equipment level Comprehensive consideration Architecture flexibility [60] Communication level Unstable environment Architecture robustness [56] Asynchronization Equipment level Stale gradient Architecture flexibility [62–64] Communication level Comprehensive consideration Trade optimization [65] Equipment level Dynamic client Time consuming optimization [66] Overall architecture Heterogeneous equipment Architecture robustness [67] 表 2 分散并行式架构(D-PSGD)相关工作
Table 2. Related works of dispersed parallel stochastic gradient descent
Research questions Research protocol Optimization objective Reference Neighbor interaction Random selection Reduce interaction complexity [69–72] Cooperation by batch rotation Improve model consistency [73] Look for similar targets Best communication partner [74] Single trust set Improve architecture robustness [75] Weight comparison selection Best communication partner [76] Communication consumption Asymmetric interaction Avoid redundant communication [77] Overlapping communication computation Avoid redundant communication and computing [78] Model compression Make full use of link resources [79] Model cutting Improve communication flexibility [80] 表 3 数据并行相关工作
Table 3. Related works of data parallel
Parameter update method Main problems Solution Reference Synchronize updates Client delay Client filtering [58–59] Client selection [83] Hybrid update [84] Partial update of model [85] Asynchronous update Obsolescence effect Astringency [61,86] Penalty old gradient [90,92] Adjust learning rate [62,91] Use momentum [94] Adjust super parameters [95] 表 4 数据非独立同分布问题相关工作
Table 4. Related works of data non-independent and identical distribution issues
表 5 模型并行相关工作
Table 5. Related works of model parallelism
-
参考文献
[1] Zhang X Z, Wang Y F, Lu S D, et al. OpenEI: an open framework for edge intelligence // 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). Dallas, 2019: 1840 [2] Wang R, Qi J P, Chen L, et al. Overview of edge intelligence oriented collaborative reasoning. J Comput Res Dev, http://kns.cnki.net/kcms/detail/11.1777.tp.20220426.1612.006.html王睿, 齐建鹏, 陈亮, 等. 面向边缘智能的边缘协同推理综述. 计算机研究与发展, http://kns.cnki.net/kcms/detail/11.1777.tp.20220426.1612.006.html [3] Zhou Z, Chen X, Li E, et al. Edge intelligence: Paving the last Mile of artificial intelligence with edge computing. Proc IEEE, 2019, 107(8): 1738 doi: 10.1109/JPROC.2019.2918951 [4] Li K L, Liu C B. Edge intelligence: State-of-the-art and expectations. Big Data Res, 2019, 5(3): 69李肯立, 刘楚波. 边缘智能: 现状和展望. 大数据, 2019, 5(3):69 [5] Tan H S, Guo D, Ke Z C, et al. Development and challenges of cloud edge collaborative intelligent edge computing. CCCF, 2020(1): 16谈海生, 郭得科, 张弛, 等. 云边端协同智能边缘计算的发展与挑战. 中国计算机协会通讯, 2020(1):16 [6] Zhang X Z, Lu S D, Shi W S. Research on collaborative computing technology in edge intelligence. AI-View, 2019, 6(5): 55张星洲, 鲁思迪, 施巍松. 边缘智能中的协同计算技术研究. 人工智能, 2019, 6(5):55 [7] Wang X F. Intelligent edge computing: From internet of everything to internet of everything empowered. Frontiers, 2020(9): 6王晓飞. 智慧边缘计算: 万物互联到万物赋能的桥梁. 人民论坛·学术前沿, 2020(9):6 [8] Fang A D, Cui L, Zhang Z W, et al. A parallel computing framework for cloud services // 2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA). Dalian, 2020: 832 [9] Lanka S, Aung Win T, Eshan S. A review on Edge computing and 5G in IOT: Architecture & Applications // 2021 5th International Conference on Electronics, Communication and Aerospace Technology (ICECA). Coimbatore, 2021: 532 [10] Carrie M, David R, Michael S. The growth in connected IoT devices is expected to generate 79.4ZB of data in 2025, according to a new IDC forecast. (2019-06-18) [2022-09-26]. https://www.businesswire.com/news/home/20190618005012 [11] Zwolenski M, Weatherill L. The digital universe rich data and the increasing value of the internet of things. J Telecommun Digital Economy, 2014, 2(3): 47.1 [12] Jin H, Jia L, Zhou Z. Boosting edge intelligence with collaborative cross-edge analytics. IEEE Internet Things J, 2021, 8(4): 2444 doi: 10.1109/JIOT.2020.3034891 [13] Jiang X L, Shokri-Ghadikolaei H, Fodor G, et al. Low-latency networking: Where latency lurks and how to tame it. Proc IEEE, 2019, 107(2): 280 doi: 10.1109/JPROC.2018.2863960 [14] Xiao Y H, Jia Y Z, Liu C C, et al. Edge computing security: State of the art and challenges. Proc IEEE, 2019, 107(8): 1608 doi: 10.1109/JPROC.2019.2918437 [15] Huang T, Liu J, Wang S, et al. Survey of the future network technology and trend. J Commun, 2021, 42(1): 130黄韬, 刘江, 汪硕, 等. 未来网络技术与发展趋势综述. 通信学报, 2021, 42(1):130 [16] Jennings A, Copenhagen van R, Rusmin T. Aspects of Network Edge Intelligence. Maluku Technical Report, 2001 [17] Song C H, Zeng P, Yu H B. Industrial Internet intelligent manufacturing edge computing: State-of-the-art and challenges. ZTE Technol J, 2019, 25(3): 50宋纯贺, 曾鹏, 于海斌. 工业互联网智能制造边缘计算: 现状与挑战. 中兴通讯技术, 2019, 25(3):50 [18] Risteska Stojkoska B L, Trivodaliev K V. A review of Internet of Things for smart home: Challenges and solutions. J Clean Prod, 2017, 140: 1454 doi: 10.1016/j.jclepro.2016.10.006 [19] Varghese B, Wang N, Barbhuiya S, et al. Challenges and opportunities in edge computing // 2016 IEEE International Conference on Smart Cloud (SmartCloud). New York, 2016: 20 [20] Shi W S, Zhang X Z, Wang Y F, et al. Edge computing: State-of-the-art and future directions. J Comput Res Dev, 2019, 56(1): 69施巍松, 张星洲, 王一帆, 等. 边缘计算: 现状与展望. 计算机研究与发展, 2019, 56(1):69 [21] Teerapittayanon S, McDanel B, Kung H T. Distributed deep neural networks over the cloud, the edge and end devices // 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). Atlanta, 2017: 328 [22] Wang X F, Han Y W, Wang C Y, et al. In-edge AI: Intelligentizing mobile edge computing, caching and communication by federated learning. IEEE Netw, 2019, 33(5): 156 doi: 10.1109/MNET.2019.1800286 [23] Kang Y P, Hauswald J, Gao C, et al. Neurosurgeon. SIGOPS Oper Syst Rev, 2017, 51(2): 615 doi: 10.1145/3093315.3037698 [24] Li E, Zhou Z, Chen X. Edge intelligence: On-demand deep learning model co-inference with device-edge synergy // Proceedings of the 2018 Workshop on Mobile Edge Communications. Budapest, 2018: 31 [25] Li Y K, Zhang T, Chen J L. Broad Siamese network for edge computing applications. Acta Autom Sin, 2020, 46(10): 2060李逸楷, 张通, 陈俊龙. 面向边缘计算应用的宽度孪生网络. 自动化学报, 2020, 46(10):2060 [26] Al-Rakhami M, Alsahli M, Hassan M M, et al. Cost efficient edge intelligence framework using docker containers // 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech). Athens, 2018: 800 [27] Al-Rakhami M, Gumaei A, Alsahli M, et al. A lightweight and cost effective edge intelligence architecture based on containerization technology. World Wide Web, 2020, 23(2): 1341 doi: 10.1007/s11280-019-00692-y [28] Zaharia M, Xin R S, Wendell P, et al. Apache spark. Commun ACM, 2016, 59(11): 56 doi: 10.1145/2934664 [29] Abadi M, Barham P, Chen J M, et al. TensorFlow: A system for large-scale machine learning [J/OL]. ArXiv Preprint (2016-05-31) [2022-09-26]. https://arxiv.org/abs/1605.08695 [30] Chen T Q, Li M, Li Y T, et al. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems [J/OL]. ArXiv Preprint (2015-12-03) [2022-09-26]. https://arxiv.org/abs/1512.01274 [31] Jin A L, Xu W C, Guo S, et al. PS: A simple yet effective framework for fast training on parameter server. IEEE Trans Parallel Distributed Syst, 2022, 33(12): 4625 doi: 10.1109/TPDS.2022.3200518 [32] Padmanandam K, Lingutla L. Practice of applied edge analytics in intelligent learning framework // 2020 21st International Arab Conference on Information Technology (ACIT). Giza, 2021: 1 [33] Ross P, Luckow A. EdgeInsight: characterizing and modeling the performance of machine learning inference on the edge and cloud // 2019 IEEE International Conference on Big Data (Big Data). Los Angeles, 2020: 1897 [34] Shi W S, Sun H, Cao J, et al. Edge computing–An emerging computing model for the Internet of everything era. J Comput Res Dev, 2017, 54(5): 907施巍松, 孙辉, 曹杰, 等. 边缘计算: 万物互联时代新型计算模型. 计算机研究与发展, 2017, 54(5):907 [35] Srivastava A, Nguyen D, Aggarwal S, et al. Performance and memory trade-offs of deep learning object detection in fast streaming high-definition images // 2018 IEEE International Conference on Big Data (Big Data). Seattle, 2018: 3915 [36] Sindhu C, Vyas D V, Pradyoth K. Sentiment analysis based product rating using textual reviews // 2017 International Conference of Electronics, Communication and Aerospace Technology (ICECA). Coimbatore, 2017: 727 [37] Hosein P, Rahaman I, Nichols K, et al. Recommendations for long-term profit optimization // Proceedings of ImpactRS@ RecSys. Copenhagen, 2019 [38] Sharma R, Biookaghazadeh S, Li B X, et al. Are existing knowledge transfer techniques effective for deep learning with edge devices? // 2018 IEEE International Conference on Edge Computing (EDGE). San Francisco, 2018: 42 [39] Bonawitz K, Eichner H, Grieskamp W, et al. Towards federated learning at scale: System design // Proceedings of Machine Learning and Systems. Palo Alto, 2019, 1: 374 [40] Kairouz P, McMahan H B, Avent B, et al. Advances and open problems in federated learning. FNT Machine Learning, 2021, 14(1-2): 1 [41] McMahan H B, Moore E, Ramage D, et al. Communication-efficient learning of deep networks from decentralized data [J/OL]. ArXiv Preprint (2017-02-28) [2022-09-26]. https://arxiv.org/abs/1602.05629 [42] Zhu J M, Zhang Q N, Gao S, et al. Privacy preserving and trustworthy federated learning model based on blockchain. Chin J Comput, 2021, 44(12): 2464朱建明, 张沁楠, 高胜, 等. 基于区块链的隐私保护可信联邦学习模型. 计算机学报, 2021, 44(12):2464 [43] Wei S Y, Tong Y X, Zhou Z M, et al. Efficient and Fair Data Valuation for Horizontal Federated Learning. Berlin: Springer, 2020 [44] Khan A, Thij M, Wilbik A. Communication-efficient vertical federated learning. Algorithms, 2022, 15(8): 273 doi: 10.3390/a15080273 [45] Chen Y Q, Qin X, Wang J D, et al. FedHealth: A federated transfer learning framework for wearable healthcare. IEEE Intell Syst, 2020, 35(4): 83 doi: 10.1109/MIS.2020.2988604 [46] Yang J, Zheng J, Zhang Z, et al. Security of federated learning for cloud-edge intelligence collaborative computing. Int J Intell Syst, 2022, 37(11): 9290 doi: 10.1002/int.22992 [47] Zhang X J, Gu H L, Fan L X, et al. No free lunch theorem for security and utility in federated learning [J/OL]. ArXiv Preprint (2022-09-05) [2022-09-26].https://arxiv.org/abs/2203.05816 [48] Deng S G, Zhao H L, Fang W J, et al. Edge intelligence: The confluence of edge computing and artificial intelligence. IEEE Internet Things J, 2020, 7(8): 7457 doi: 10.1109/JIOT.2020.2984887 [49] Feng C, Han P C, Zhang X, et al. Computation offloading in mobile edge computing networks: A survey. J Netw Comput Appl, 2022, 202: 103366 doi: 10.1016/j.jnca.2022.103366 [50] Qiao D W, Guo S T, He J, et al. Edge intelligence: Research progress and challenges. Radio Commun Technol, 2022, 48(1): 34乔德文, 郭松涛, 何静, 等. 边缘智能: 研究进展及挑战. 无线电通信技术, 2022, 48(1):34 [51] Fortino G, Zhou M C, Hassan M M, et al. Pushing artificial intelligence to the edge: Emerging trends, issues and challenges. Eng Appl Artif Intell, 2021, 103: 104298 doi: 10.1016/j.engappai.2021.104298 [52] Qiu X C, Fernández-Marqués J, Gusmão P, et al. ZeroFL: Efficient on-device training for federated learning with local sparsity [J/OL]. ArXiv Preprint (2022-08-04) [2022-09-26]. https://arxiv.org/abs/2208.02507 [53] Long S Q, Long W F, Li Z T, et al. A game-based approach for cost-aware task assignment with QoS constraint in collaborative edge and cloud environments. IEEE Trans Parallel Distributed Syst, 2021, 32(7): 1629 doi: 10.1109/TPDS.2020.3041029 [54] Zhu H R, Yuan G J, Yao C J, et al. Survey on network of distributed deep learning training. J Comput Res Dev, 2021, 58(1): 98 doi: 10.7544/issn1000-1239.2021.20190881朱泓睿, 元国军, 姚成吉, 等. 分布式深度学习训练网络综述. 计算机研究与发展, 2021, 58(1):98 doi: 10.7544/issn1000-1239.2021.20190881 [55] Rafique Z, Khalid H M, Muyeen S M. Communication systems in distributed generation: A bibliographical review and frameworks. IEEE Access, 2020, 8: 207226 doi: 10.1109/ACCESS.2020.3037196 [56] Hsieh K, Harlap A, Vijaykumar N, et al. Gaia: Geo-distributed machine learning approaching LAN speeds // Proceedings of the 14th USENIX Conference on Networked Systems Design and Implementation. New York, 2017: 629 [57] Konečný J, McMahan H B, Yu F X, et al. Federated learning: Strategies for improving communication efficiency [J/OL]. ArXiv Preprint (2017-10-30) [2022-09-26]. https://arxiv.org/abs/1610.05492 [58] Chen J M, Pan X H, Monga R, et al. Revisiting distributed synchronous SGD [J/OL]. ArXiv Preprint (2017-03-21) [2022-09-26]. https://arxiv.org/abs/1604.00981 [59] Nishio T, Yonetani R. Client selection for federated learning with heterogeneous resources in mobile edge // ICC 2019–2019 IEEE International Conference on Communications (ICC). Shanghai, 2019: 1 [60] Wang S Q, Tuor T, Salonidis T, et al. When edge meets learning: Adaptive control for resource-constrained distributed machine learning // IEEE INFOCOM 2018-IEEE Conference on Computer Communications. Honolulu, 2018: 63 [61] Lian X R, Huang Y J, Li Y C, et al. Asynchronous parallel stochastic gradient for nonconvex optimization // Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, 2015: 2737 [62] Zhang W, Gupta S, Lian X R, et al. Staleness-aware async-SGD for distributed deep learning [J/OL]. ArXiv Preprint (2014-04-05) [2022-09-26]. https://arxiv.org/abs/1511.05950 [63] Lu X F, Liao Y Y, Lio P, et al. Privacy-preserving asynchronous federated learning mechanism for edge network computing. IEEE Access, 2020, 8: 48970 doi: 10.1109/ACCESS.2020.2978082 [64] Chen Y J, Ning Y, Slawski M, et al. Asynchronous online federated learning for edge devices with non-IID data // 2020 IEEE International Conference on Big Data (Big Data). Atlanta, 2021: 15 [65] Dutta S, Wang J Y, Joshi G. Slow and stale gradients can win the race. IEEE J Sel Areas Inf Theory, 2021, 2(3): 1012 doi: 10.1109/JSAIT.2021.3103770 [66] Lu Y L, Huang X H, Zhang K, et al. Blockchain empowered asynchronous federated learning for secure data sharing in Internet of vehicles. IEEE Trans Veh Technol, 2020, 69(4): 4298 doi: 10.1109/TVT.2020.2973651 [67] Wu W T, He L G, Lin W W, et al. SAFA: A semi-asynchronous protocol for fast federated learning with low overhead. IEEE Trans Comput, 2021, 70(5): 655 doi: 10.1109/TC.2020.2994391 [68] Luehr N. Fast multi-GPU collectives with NCCL [J/OL]. NVIDIA Developer (2016-04-07) [2022-09-26]. https://developer.nvidia.com/blog/fast-multi-gpu-collectives-nccl [69] Lian X R, Zhang W, Zhang C, et al. Asynchronous decentralized parallel stochastic gradient descent [J/OL]. ArXiv Preprint (2018-09-25) [2022-09-26]. https://arxiv.org/abs/1710.06952 [70] Lalitha A, Kilinc O C, Javidi T, et al. Peer-to-peer federated learning on graphs [J/OL]. ArXiv Preprint (2019-01-31) [2022-09-26].https://arxiv.org/abs/1901.11173 [71] Blot M, Picard D, Cord M, et al. Gossip training for deep learning [J/OL]. ArXiv Preprint (2016-11-29) [2022-09-26]. https://arxiv.org/abs/1611.09726 [72] Jin P H, Yuan Q C, Iandola F, et al. How to scale distributed deep learning? [J/OL]. ArXiv Preprint (2016-11-14) [2022-09-26]. https://arxiv.org/abs/1611.04581 [73] Daily J, Vishnu A, Siegel C, et al. GossipGraD: Scalable Deep Learning using Gossip Communication based asynchronous gradient descent [J/OL]. ArXiv Preprint (2018-03-15) [2022-09-26]. https://arxiv.org/abs/1803.05880 [74] Vanhaesebrouck P, Bellet A, Tommasi M. Decentralized collaborative learning of personalized models over networks // Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. Florida, 2017: 509 [75] He C Y, Tan C H, Tang H L, et al. Central server free federated learning over single-sided trust social networks [J/OL]. ArXiv Preprint (2020-08-01) [2022-09-26]. https://arxiv.org/abs/1910.04956 [76] Colin I, Bellet A, Salmon J, et al. Gossip dual averaging for decentralized optimization of pairwise functions[J/OL]. ArXiv Preprint (2016-06-08) [2022-09-26]. https://arxiv.org/abs/1606.02421 [77] Nedić A, Olshevsky A. Stochastic gradient-push for strongly convex functions on time-varying directed graphs. IEEE Trans Autom Control, 2016, 61(12): 3936 doi: 10.1109/TAC.2016.2529285 [78] Assran M, Loizou N, Ballas N, et al. Stochastic gradient push for distributed deep learning // Proceedings of the 36th International Conference on Machine Learning. California, 2019: 344 [79] Koloskova A, Stich S, Jaggi M. Decentralized stochastic optimization and gossip algorithms with compressed communication // Proceedings of the 36th International Conference on Machine Learning. California, 2019: 3478 [80] Hu C H, Jiang J Y, Wang Z. Decentralized federated learning: A segmented gossip approach [J/OL]. ArXiv Preprint (2019-08-21) [2022-09-26]. https://arxiv.org/abs/1908.07782 [81] Ruder S. An overview of gradient descent optimization algorithms [J/OL]. ArXiv Preprint (2017-06-15) [2022-09-26]. https://arxiv.org/abs/1609.04747 [82] Chahal K S, Grover M S, Dey K, et al. A hitchhiker’s guide on distributed training of deep neural networks. J Parallel Distributed Comput, 2020, 137: 65 doi: 10.1016/j.jpdc.2019.10.004 [83] Chai Z, Ali A, Zawad S, et al. TiFL: A tier-based federated learning system // Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing. Stockholm, 2020: 125 [84] Li X Y, Qu Z, Tang B, et al. Stragglers are not disaster: A hybrid federated learning algorithm with delayed gradients[J/OL]. ArXiv Preprint (2021-02-12) [2022-09-26]. https://arxiv.org/abs/2102.06329 [85] Xu Z R, Yang Z, Xiong J J, et al. ELFISH: Resource-aware federated learning on heterogeneous edge devices[J/OL]. ArXiv Preprint (2021-03-01) [2022-09-26]. https://arxiv.org/abs/1912.01684 [86] Agarwal A, Duchi J C. Distributed delayed stochastic optimization // Proceedings of the 24th International Conference on Neural Information Processing Systems. Granada, 2011: 873 [87] Sahu A N, Dutta A, Tiwari A, et al. On the convergence analysis of asynchronous SGD for solving consistent linear systems [J/OL]. ArXiv Preprint (2020-04-05) [2022-09-26]. https://arxiv.org/abs/2004.02163 [88] Dean J, Corrado G S, Monga R, et al. Large scale distributed deep networks // Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, 2012: 1223 [89] Zhang S X, Choromanska A, LeCun Y. Deep learning with elastic averaging SGD // Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, 2015: 685 [90] Xie C, Koyejo S, Gupta I. Asynchronous federated optimization [J/OL]. ArXiv Preprint (2020-12-05) [2022-09-26]. https://arxiv.org/abs/1903.03934 [91] Odena A. Faster asynchronous SGD [J/OL]. ArXiv Preprint (2016-01-15) [2022-09-26]. https://arxiv.org/abs/1601.04033 [92] Chan W, Lane I. Distributed asynchronous optimization of convolutional neural networks // Proceedings of Fifteenth Annual Conference of the International Speech Communication Association. Singapore, 2014: 1073 [93] Sutskever I, Martens J, Dahl G, et al. On the importance of initialization and momentum in deep learning // Proceedings of the 30th International Conference on International Conference on Machine Learning. Atlanta, 2013: 1139 [94] Hakimi I, Barkai S, Gabel M, et al. Taming momentum in a distributed asynchronous environment [J/OL]. ArXiv Preprint (2020-10-14) [2022-09-26]. https://arxiv.org/abs/1907.11612 [95] Chen M, Mao B C, Ma T Y. FedSA: A staleness-aware asynchronous federated learning algorithm with non-IID data. Future Gener Comput Syst, 2021, 120: 1 doi: 10.1016/j.future.2021.02.012 [96] Li X, Huang K X, Yang W H, et al. On the convergence of FedAvg on non-IID data [J/OL]. ArXiv Preprint (2020-06-25) [2022-09-26]. https://arxiv.org/abs/1907.02189 [97] Khaled A, Mishchenko K, Richtárik P. First analysis of local GD on heterogeneous data [J/OL]. ArXiv Preprint (2020-03-18) [2022-09-26]. https://arxiv.org/abs/1909.04715 [98] Hsu T M H, Qi H, Brown M. Measuring the effects of non-identical data distribution for federated visual classification [J/OL]. ArXiv Preprint (2019-09-13) [2022-09-26]. https://arxiv.org/abs/1909.06335 [99] Karimireddy S P, Kale S, Mohri M, et al. SCAFFOLD: Stochastic controlled averaging for on-device federated learning [J/OL]. ArXiv Preprint (2021-04-09) [2022-09-26]. https://arxiv.org/abs/1910.06378 [100] Li T, Sahu A K, Zaheer M, et al. Federated optimization in heterogeneous networks [J/OL]. ArXiv Preprint (2020-04-21) [2022-09-26]. https://arxiv.org/abs/1812.06127 [101] Wang J Y, Liu Q H, Liang H, et al. Tackling the objective inconsistency problem in heterogeneous federated optimization [J/OL]. ArXiv Preprint (2020-07-15) [2022-09-26]. https://arxiv.org/abs/2007.07481 [102] Hsu T M H, Qi H, Brown M. Federated visual classification with real-world data distribution [J/OL]. ArXiv Preprint (2020-07-17) [2022-09-26]. https://arxiv.org/abs/2003.08082 [103] Zhao Y, Li M, Lai L Z, et al. Federated learning with non-IID data [J/OL]. ArXiv Preprint (2022-07-21) [2022-09-26]. https://arxiv.org/abs/1806.00582 [104] Yoshida N, Nishio T, Morikura M, et al. Hybrid-FL for wireless networks: Cooperative learning mechanism using non-IID data // ICC 2020–2020 IEEE International Conference on Communications (ICC). Dublin, 2020: 1 [105] Shoham N, Avidor T, Keren A, et al. Overcoming forgetting in federated learning on non-IID data [J/OL]. ArXiv Preprint (2019-10-17) [2022-09-26]. https://arxiv.org/abs/1910.07796 [106] Huang Y T, Chu L Y, Zhou Z R, et al. Personalized cross-silo federated learning on non-IID data. Proc AAAI Conf Artif Intell, 2021, 35(9): 7865 [107] Wu Q, He K W, Chen X. Personalized federated learning for intelligent IoT applications: A cloud-edge based framework. IEEE Open J Comput Soc, 2020, 1: 35 doi: 10.1109/OJCS.2020.2993259 [108] Günther S, Ruthotto L, Schroder J B, et al. Layer-parallel training of deep residual neural networks [J/OL]. ArXiv Preprint (2019-07-25) [2022-09-26]. https://arxiv.org/abs/1812.04352 [109] Mayer R, Jacobsen H-A. Scalable deep learning on distributed infrastructures: Challenges, techniques, and tools. ACM Comput Surv, 2020, 53(1): 1 [110] Jia Z H, Zaharia M, Aiken A. Beyond data and model parallelism for deep neural networks [J/OL]. ArXiv Preprint (2018-07-14) [2022-09-26]. https://arxiv.org/abs/1807.05358 [111] Harlap A, Narayanan D, Phanishayee A, et al. PipeDream: Fast and efficient pipeline parallel DNN training [J/OL]. ArXiv Preprint (2018-06-08) [2022-09-26]. https://arxiv.org/abs/1806.03377 [112] Chen C C, Yang C L, Cheng H Y. Efficient and robust parallel DNN training through model parallelism on multi-GPU platform [J/OL]. ArXiv Preprint (2019-10-28) [2022-09-26]. https://arxiv.org/abs/1809.02839 [113] Huang Y P, Cheng Y L, Bapna A, et al. GPipe: Efficient training of giant neural networks using pipeline parallelism [J/OL]. ArXiv Preprint (2019-07-25) [2022-09-26]. https://arxiv.org/abs/1811.06965 [114] Mirhoseini A, Pham H, Le Q V, et al. Device placement optimization with reinforcement learning // Proceedings of the 34th International Conference on Machine Learning. Sydney, 2017: 2430 [115] Shoeybi M, Patwary M, Puri R, et al. Megatron-LM: Training multi-billion parameter language models using model parallelism [J/OL]. ArXiv Preprint (2020-03-13) [2022-09-26]. https://arxiv.org/abs/1909.08053 [116] Frankle J, Carbin M. The lottery ticket hypothesis: Finding sparse, trainable neural networks [J/OL]. ArXiv Preprint (2019-03-04) [2022-09-26]. https://arxiv.org/abs/1803.03635 [117] Wang Z D, Liu X X, Huang L, et al. QSFM: Model pruning based on quantified similarity between feature maps for AI on edge. IEEE Internet Things J, 2022, 9(23): 24506 doi: 10.1109/JIOT.2022.3190873 [118] Wang J, Zhang J G, Bao W D, et al. Not just privacy: Improving performance of private deep learning in mobile cloud // Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. London, 2018: 2407 [119] Zhang L F, Tan Z H, Song J B, et al. Scan: A scalable neural networks framework towards compact and efficient models // 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). Vancouver, 2019: 32 [120] Gou J P, Yu B S, Maybank S J, et al. Knowledge distillation: A survey [J/OL]. ArXiv Preprint (2021-03-20) [2022-09-26]. https://arxiv.org/abs/2006.05525 [121] Phuong M, Lampert C H. Towards understanding knowledge distillation [J/OL]. ArXiv Preprint (2021-03-27) [2022-09-26].https://arxiv.org/abs/2105.13093 [122] Anil R, Pereyra G, Passos A, et al. Large scale distributed neural network training through online distillation [J/OL]. ArXiv Preprint (2020-08-20) [2022-09-26]. https://arxiv.org/abs/1804.03235 [123] Jeong E, Oh S, Kim H, et al. Communication-efficient on-device machine learning: Federated distillation and augmentation under non-IID private data [J/OL]. ArXiv Preprint (2018-11-28) [2022-09-26]. https://arxiv.org/abs/1811.11479 [124] Shen T, Zhang J, Jia X K, et al. Federated mutual learning [J/OL]. ArXiv Preprint (2020-09-17) [2022-09-26]. https://arxiv.org/abs/2006.16765 [125] Sattler F, Marban A, Rischke R, et al. Communication-efficient federated distillation [J/OL]. ArXiv Preprint (2020-12-01) [2022-09-26]. https://arxiv.org/abs/2012.00632 [126] Ahn J H, Simeone O, Kang J. Wireless federated distillation for distributed edge learning with heterogeneous data // 2019 IEEE 30th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC). Istanbul, 2019: 1 -