-
摘要: 针对龙芯中央处理器(CPU)无对应高性能服务器芯片组的现状,设计开发了一种为龙芯CPU筛选芯片组的架构,并实现了一种龙芯CPU和芯片组适配的方法。提出了采用现场可编程门阵列(FPGA)串联在龙芯CPU和即将适配的多组芯片组之间的架构。借助于此架构,设计实现了在CPU和芯片组之间待处理物理信号线的连接方法,设计了两者之间上下电时序配合的调试方法,设计实现了规避两者信号协议差异的方法。借助该架构和这些方法能够实现同时筛选多款芯片组的目的,避免了以前需要设计多款主板进行适配的情况,节省了重复研发主板的成本;找到了可以适配龙芯CPU的高性能服务器芯片组;其芯片组规格参数和性能高于目前龙芯CPU所用的芯片组,开拓了其在服务器领域的应用。Abstract: The CPU is the core part of all integrated circuits. Although some homemade CPUs of proprietary intellectual property rights are rapidly developed, few high-performance chipsets are available, especially in server domains, to match them. Thus, the total systems designed using these CPUs and low-performance chipsets do not have proper performance. The Loongson CPU faces the same problem. To seek better chipsets for it, certain architecture and some methods are designed and implemented to adapt different types of chipsets. In this architecture, a field-programmable gate array (FPGA) is linked between a CPU and these chipsets. An FPGA is divided into three domains: an HT (hyper transport) bus domain, a processing domain for important but temporarily indeterminate signals, and a CPLD (complex programmable logic device) function domain. In these adaption processes, HT bus signals, the temporarily indeterminate signals, and power signals in CPUs and chipsets are respectively linked into three domains in an FPGA and treated by a programming FPGA to perform all types of possible signal combinations. The power sequence between the CPU and chipsets is coordinated to the right order using an FPGA. The signal integrity difference between them is avoided and trimmed to the right state by amending their signals in the FPGA. In this system, the experimental results show that this architecture and these methods simultaneously make more chipsets work together to be adapted than before in a single motherboard. This combination avoids researching and developing many different motherboards for every type of possible chipset and greatly reduces costs. High-performance server chipsets can be found to properly match the Loongson CPU and have better specifications and higher performance than those currently used for the Loongson CPU. A prototype system composed of the Loongson CPU and five types of chipsets is designed and implemented. Using the above architecture and methods, a type of optimal server chipsets SR5690 + SP5100 has been found, and the matching principles or correct settings for the signal connection and power sequence have been concluded. The Loongson 3B4000 two-way SMP motherboard with SR5690 + SP5100 chipsets is also produced. On this motherboard, the results of evaluation experiments on computing performance tests by the SPEC CPU 2006 program, storage performance tests by the IO zone program, and network performance tests by the Netperf program are performed. Compared with the current Loongson 3B4000 server with a 7A1000 chipset, the test results show the performance on three items is improved by approximately 10%. The combination of the Loongson CPU and this type of server chipset provides wider applications in the server market and promotes the development of the Loongson CPU in its ecosystem.
-
Key words:
- Loongson /
- chipsets /
- adaption /
- server /
- field-programmable gate array
-
表 1 HT总线的连接信号线
Table 1. Hyper transport bus link signals
Signal Width Description CAD 2, 4, 8, or 16 Command, addresses, and data (CAD). Carries HyperTransport™ requests, responses, addresses, and data. CAD width can be different in each direction. CTL 1, 2, or 4 Differentiates control and data. Each byte of CAD has a control(CTL) signal in the Gen3 protocol. One CTL signal is used for an entire link in the Gen1 protocol. CLK 1, 2, or 4 Clocks(CLK)for the CAD and CTL signals. Each byte of CAD and its respective CTL signal has a separate clock signal. 表 2 HT总线的复位/初始化信号线
Table 2. Reset/Initialization signals of the HT bus
Signal Width Description PWROK 1 Power and clocks are stable RESET# 1 Reset the HyperTransport™ chain 表 3 HT总线的电源管理信号线
Table 3. Power management signals
Signal Width Description LDTSTOP# 1 Enables and disables links during system state transitions LDTREQ# 1 Indicates link is active or requested by a device 表 4 芯片组规格对比
Table 4. Comparison of different chipset specifications
Item Features of 7A1000 Features of SR5690 + SP5100 HT bus HT3.0 × 16 HT3.0 × 16 PCIE 32 lanes 42 lanes SATA 3 × SATA2.0 6 × SATA2.0 USB Ports 6 × USB2.0 14 × USB2.0 RAS No Yes IOMMU No Yes 表 5 SPEC CPU2006性能对比
Table 5. Analysis of SPEC CPU2006 performance
Server int_speed_
baseint_rate_
basefp_speed_
basefp_rate_
base7A1000 server 12.30 78.07 12.02 74.90 SR5690+ SP5100 server 13.02 83.60 12.80 82.60 Performance improvement/% 6 7 6 10 表 6 IOZone性能对比
Table 6. Analysis of IOzone performance
Server 512 Byte read speed/ (MB·s−1)(Average of three results) 1 MB read speed/ (MB·s−1)(Average of three results) 512 Byte write speed/ (MB·s−1)(Average of three results) 1 MB write speed/ (MB·s−1)(Average of three results) 7A1000 server 38.56 696.31 1.25 306.76 SR5690+SP5100 server 43.19 800.76 1.53 383.45 Performance improvement/% 12 15 22 25 表 7 Netperf性能对比
Table 7. Analysis of Netperf performance
Server TCP Throughput/ (MB·s−1)(Average of three results) TCP
transfer rate/ (Times·s−1)(Average of three results)UDP
Throughput/ (MB·s−1)(Average of three results)UDP
transfer rate/ (Times·s−1)(Average of three results)7A1000 server 850.51 8738.91 852.64 8999.10 SR5690+SP5100 server 935.56 9787.58 946.43 9989.00 Performance improvement/% 10 12 11 11 -
参考文献
[1] Hu W W. Developing our own CPU should take the road of marketing driven technology. J Inf Secur Res, 2019, 5(5): 450胡伟武. 发展自主CPU应该走市场带技术的道路. 信息安全研究, 2019, 5(5):450 [2] Ni G N. Adhere to the self-reliance and self-improvement of IT innovation system technology, build a powerful network country and digital China. J Inf Secur Res, 2021, 7(1): 2 doi: 10.3969/j.issn.2096-1057.2021.01.001倪光南. 坚持信创科技自立自强建设网络强国和数字中国. 信息安全研究, 2021, 7(1):2 doi: 10.3969/j.issn.2096-1057.2021.01.001 [3] Ma W, Yao J B, Chang Y S, et al. Current situation and prospect of CPU development in China. Appl IC, 2019, 36(4): 5马威, 姚静波, 常永胜, 等. 国产CPU发展的现状与展望. 集成电路应用, 2019, 36(4):5 [4] Xiong J, Xia Z P, Lin J, et al. Study of performance test scheme of information system based on domestic CPU and OS. Comput Eng, 2015, 41(12): 82 doi: 10.3969/j.issn.1000-3428.2015.12.016熊婧, 夏仲平, 林军, 等. 基于国产CPU/OS的信息系统性能测试方案研究. 计算机工程, 2015, 41(12):82 doi: 10.3969/j.issn.1000-3428.2015.12.016 [5] Zhang Z G, Zheng N B, Zhou Z F, et al. The research and design of office information system based on homemade software and hardware. Comput Inf Technol, 2012, 20(5): 8 doi: 10.3969/j.issn.1005-1228.2012.05.003张忠革, 郑年斌, 周泽峰, 等. 基于国产CPU/OS的办公信息系统研究与设计. 电脑与信息技术, 2012, 20(5):8 doi: 10.3969/j.issn.1005-1228.2012.05.003 [6] Hu X D, Yang J X, Zhu Y. Shenwei-1600: a high-performance multi-core microprocessor. Sci Sin Information, 2015, 45(4): 513 doi: 10.1360/N112014-00295胡向东, 杨剑新, 朱英. 高性能多核处理器申威1600. 中国科学:信息科学, 2015, 45(4):513 doi: 10.1360/N112014-00295 [7] Shen J, Long B, Jiang H, et al. Implementation and optimization of vector trigonometric functions on phytium processors. J Comput Res Dev, 2020, 57(12): 2610 doi: 10.7544/issn1000-1239.2020.20190721沈洁, 龙标, 姜浩, 等. 飞腾处理器上向量三角函数的设计实现与优化. 计算机研究与发展, 2020, 57(12):2610 doi: 10.7544/issn1000-1239.2020.20190721 [8] Fang J B, Du Q, Tang T, et al. Performance comparison between FT-1500A and Intel XEON. Comput Eng Sci, 2019, 41(1): 1 doi: 10.3969/j.issn.1007-130X.2019.01.001方建滨, 杜琦, 唐滔, 等. 飞腾处理器与商用处理器性能比较. 计算机工程与科学, 2019, 41(1):1 doi: 10.3969/j.issn.1007-130X.2019.01.001 [9] Hu X D, Ke X M, Yin F, et al. Shenwei-26010: A high-performance many-core processor. J Comput Res Dev, 2021, 58(6): 1155 doi: 10.7544/issn1000-1239.2021.20201041胡向东, 柯希明, 尹飞, 等. 高性能众核处理器申威26010. 计算机研究与发展, 2021, 58(6):1155 doi: 10.7544/issn1000-1239.2021.20201041 [10] Hong W J, Li K L, Quan Z, et al. PETSc's heterogeneous parallel algorithm design and performance optimization on the Sunway TaihuLight system. Chin J Comput, 2017, 40(9): 2057 doi: 10.11897/SP.J.1016.2017.02057洪文杰, 李肯立, 全哲, 等. 面向神威·太湖之光的PETSc可扩展异构并行算法及其性能优化. 计算机学报, 2017, 40(9):2057 doi: 10.11897/SP.J.1016.2017.02057 [11] Meng X F, Gao X, Cong M, et al. System performance optimization and analysis of Godson-3A multiprocessor. J Comput Res Dev, 2012, 49(Suppl 1): 137孟小甫, 高翔, 从明, 等. 龙芯3A多核处理器系统级性能优化与分析. 计算机研究与发展, 2012, 49(增刊1): 137 [12] Zhao H, Wan J W, Bao Z G, et al. Application of independent and controllable technology in test missions. J Spacecr TT&C Technol, 2015, 34(2): 109赵辉, 万俊伟, 鲍忠贵, 等. 自主可控技术在试验任务领域的应用研究. 飞行器测控学报, 2015, 34(2):109 [13] Yuan G X, Zhang Y Q, Yuan L. State of the art analysis of China HPC 2021. Comput Eng Sci, 2021, 43(12): 2091 doi: 10.3969/j.issn.1007-130X.2021.12.001袁国兴, 张云泉, 袁良. 2021年中国高性能计算机发展现状分析. 计算机工程与科学, 2021, 43(12):2091 doi: 10.3969/j.issn.1007-130X.2021.12.001 [14] Cai F, Shen H H, Gao X. The design and implementation of north-bridge used in Godson-2 prototype system. Chin High Technol Lett, 2010, 20(1): 61 doi: 10.3772/j.issn.1002-0470.2010.01.011蔡飞, 沈海华, 高翔. 龙芯2号原型系统北桥的设计与实现. 高技术通讯, 2010, 20(1):61 doi: 10.3772/j.issn.1002-0470.2010.01.011 [15] Liu D, Li X, Xu S Y, et al. Design and implementation of homemade information processing platform. J Telem Track Command, 2018, 39(6): 7 doi: 10.3969/j.issn.2095-1000.2018.06.002刘达, 李鑫, 徐松艳, 等. 国产化信息处理平台设计与实现. 遥测遥控, 2018, 39(6):7 doi: 10.3969/j.issn.2095-1000.2018.06.002 [16] Zhu S S, Lu Y K, Liu L, et al. Design of AIO security computer based on Loongson CPU. Ind Control Comput, 2020, 33(11): 16 doi: 10.3969/j.issn.1001-182X.2020.11.006朱书杉, 路永轲, 刘磊, 等. 基于龙芯处理器的一体式安全计算机设计. 工业控制计算机, 2020, 33(11):16 doi: 10.3969/j.issn.1001-182X.2020.11.006 [17] Zhao B, Yang M H, Liu W, et al. Research on security & trust computer based on Loongson CPU. Comput Technol Dev, 2015, 25(3): 126赵斌, 杨明华, 柳伟, 等. 基于龙芯处理器的自主可信计算机研究. 计算机技术与发展, 2015, 25(3):126 [18] Wu J. The Design of North-bridge Used in Godson System[Dissertation]. Hefei: University of Science and Technology of China, 2003武杰. 龙芯系统中的北桥设计[学位论文]. 合肥: 中国科技大学, 2003 [19] Evans A, Silburt A, Vrckovnik G, et al. Functional verification of large ASICs // Proceedings of the 35th annual Design Automation Conference. New York, 1998: 650 [20] Ganapathy G, Narayan R, Jorden G, et al. Hardware emulation for functional verification of K5 // Proceedings of the 33rd Design Automation Conference. Las Vegas, 1996: 315 [21] Ray J, Hoe J C. High-level modeling and FPGA prototyping of microprocessors // Proceedings of the 2003 ACM/SIGDA Eleventh International Symposium on Field Programmable Gate Arrays. Monterey, 2003: 100 [22] Li X B, Tang Z M, Li W. FPGA verification for heterogeneous multi-core processor. J Comput Res Dev, 2021, 58(12): 2684 doi: 10.7544/issn1000-1239.2021.20200289李小波, 唐志敏, 李文. 面向异构多核处理器的FPGA验证. 计算机研究与发展, 2021, 58(12):2684 doi: 10.7544/issn1000-1239.2021.20200289 [23] Liu Y C, Wang J, Chen Y J, et al. Survey on computer system simulator. J Comput Res Dev, 2015, 52(1): 3 doi: 10.7544/issn1000-1239.2015.20140104刘雨辰, 王佳, 陈云霁, 等. 计算机系统模拟器研究综述. 计算机研究与发展, 2015, 52(1):3 doi: 10.7544/issn1000-1239.2015.20140104 [24] Gateley J, Blatt M, Chen D, et al. UltraSPARC-I emulation // Proceedings of the 32nd ACM/IEEE Conference on Design Automation Conference. San Francisco, 1995: 13 [25] Zhou S J, Prasanna V K. Accelerating graph analytics on CPU-FPGA heterogeneous platform // 2017 29th International Symposium on Computer Architecture and High Performance Computing. Campinas, 2017: 137 [26] Zhou H W, Xu S, Wang Z Y, et al. FPGA verification for memory link interface of many-core processor. J Natl Univ Def Technol, 2018, 40(3): 176 doi: 10.11887/j.cn.201803027周宏伟, 徐实, 王忠奕, 等. 众核处理器访存链路接口的FPGA验证. 国防科技大学学报, 2018, 40(3):176 doi: 10.11887/j.cn.201803027 [27] Pang K, Shi Z F, Zhou J H, et al. Network topology exploration of coarse-grained reconfigurable architecture based on FPGA. J Tianjin Univ Sci Technol, 2018, 51(5): 507庞科, 史再峰, 周佳慧, 等. 基于FPGA的粗粒度可重构系统拓扑网络结构开发. 天津大学学报(自然科学与工程技术版), 2018, 51(5):507 [28] Liu Y F, Liu P, Jiang Y T, et al. Building a multi-FPGA-based emulation framework to support networks-on-chip design and verification. Int J Electron, 2010, 97(10): 1241 doi: 10.1080/00207217.2010.512017 [29] Hu W W, Wang J, Gao X, et al. Godson-3: A scalable multicore RISC processor with x86 emulation. IEEE Micro, 2009, 29(2): 17 doi: 10.1109/MM.2009.30 [30] Kalla R, Sinharoy B, Tendler J M. IBM Power5 chip: A dual-core multithreaded processor. IEEE Micro, 2004, 24(2): 40 doi: 10.1109/MM.2004.1289290 [31] Kongetira P, Aingaran K, Olukotun K. Niagara: a 32-way multithreaded Sparc processor. IEEE Micro, 2005, 25(2): 21 doi: 10.1109/MM.2005.35 [32] Chen X M, Jha N K. A 3-D CPU-FPGA-DRAM hybrid architecture for low-power computation. IEEE Trans Very Large Scale Integr (VLSI)Syst, 2016, 24(5): 1649 doi: 10.1109/TVLSI.2015.2483525 [33] Wang H D, Gao X, Chen Y J, et al. Interconnection of Godson-3 multi-core processor. J Comput Res Dev, 2008, 45(12): 2001王焕东, 高翔, 陈云霁, 等. 龙芯3号互联系统的设计与实现. 计算机研究与发展, 2008, 45(12):2001 [34] Feng K K, Jia F, Du X J, et al. Design and realization of HT interconnection and memory fault diagnosis method for Loongson-3 mainboard. Comput Meas Control, 2020, 28(6): 1 -