基于自适应–相似修正的网络安全态势预测方法

Network security situation prediction method via step adaptation and similarity-based correction

  • 摘要: 随着信息技术的快速发展和互联网应用的日益普及,网络安全形势日益严峻,频繁的网络攻击严重威胁着国家安全和经济利益,因此准确预测网络安全态势重要而紧迫. 但由于传统模型的固定输入长度和数据的非平稳性,现有的预测方法精度不足. 对此本文提出一种基于自适应–相似修正的网络安全态势预测方法. 首先,提出一种步长自适应策略来确定预测模型的初始输入. 即引入变分模态分解来提取原始态势数据集的模态分量集,并对分量集中的周期分量,利用快速傅里叶变换确定其周期个数,作为其对应预测模型的输入长度;对非周期分量,利用递减Lempel–Ziv复杂度准则来自适应确定其对应预测模型的输入长度. 其次,对模态分量的每个分量值,由训练数据集来构建其对应的支持向量机子模型. 再次,在给定的初始输入长度下,基于余弦方差相似度判据,在训练数据集中筛选与测试集初始输入长度相同、变化趋势相似的数据子集. 从此,基于上述支持向量机子模型,对该相似数据子集获得初始预测结果,并将相似数据子集与其初始预测结果作为最终的预测模型输入,实现对初始支持向量机子模型的修正. 最后,在标准网络安全数据集NSL-KDD上的实验表明:所提单步预测方法均方误差(MSE)为1.75×10−4、平均绝对误差(MAE)为1.07×10−2、决定系数(R2)为0.984,其预测精度显著优于传统浅层学习、深度学习及支持向量机方法;在四步预测中,引入修正机制后效果更明显,与修正前相比,MAE、MSE分别降低了29.00%、53.69%,R2提升了5.03%;为进一步验证本文方法的泛化性,选取国家互联网应急中心的数据进行验证,结果证明本文方法预测效果最优.

     

    Abstract: The rapid development of information technology and increasing penetration of Internet applications have increasingly worsened the cybersecurity landscape. National security and economic interests are seriously threatened by frequent cyberattacks, making the accurate prediction of cybersecurity situational awareness an important and urgent research task. Existing prediction methods are limited by insufficient accuracy owing to the fixed input length of traditional models and the nonstationary nature of the data. To address this issue, a cybersecurity situational awareness prediction method using step adaptation and similarity-based correction is proposed. First, variational modal decomposition is introduced to extract the main modal components. Second, the fast Fourier transform is used to determine the period number for the input length of the prediction model. For the nonperiodic modal components, the decreasing Lempel–Ziv complexity criterion is used to determine the input length of the prediction model adaptively. Third, for each modal component, the support vector machine submodel is constructed using the training dataset. Finally, based on the cosine variance similarity index, similar subsets corresponding to the test set are searched in the training dataset. In addition, using the above submodel, the initial prediction result of a similar data subset is obtained. The similar data subset and initial prediction results are obtained for the final inputs of the support vector machine prediction model. Experiments conducted on the standard cybersecurity dataset NSL-KDD demonstrate the following. First, for the predictive performance of the initial input of the predictive model determined by the proposed step-adaptive strategy, the coefficient of determination (R2) remains higher than those of the other input lengths. The predictive performance of the proposed similarity-based correction mechanism exhibits more pronounced effects in multistep predictions. In the four-step predictions, the mean absolute error (MAE), mean squared error (MSE), and mean absolute percentage error (MAPE) decrease by 29.00%, 53.69%, and 36.53%, respectively, while R2 increases by 5.03%. Ultimately, for the overall prediction performance of the proposed adaptive-similarity-based cybersecurity threat prediction method, the single-step prediction method yields an MSE of 1.75×10−4, MAE of 1.07×10−2, MAPE of 5.61×10−2, and R2 of 0.984. Compared with backpropagation (BP), long short-term memory (LSTM), and temporal convolutional networks (TCN), the proposed method demonstrates superior prediction performance. The two-step prediction method yields an MSE of 2.22×10−4, MAE of 0.122, MAPE of 6.59×10−2, and R2 of 0.979. The three-step prediction method has an MSE of 3.41×10−4, MAE of 0.149, MAPE of 8.44×10−2, and R2 of 0.968. The four-step prediction method has an MSE of 4.14×10−4, MAE of 0.164, MAPE of 9.33×10−2, and R2 of 0.961. The prediction accuracy of the network security status prediction method based on step adaptation and similarity-based correction is confirmed to be significantly superior to that of traditional shallow learning, deep learning, and original support vector machine methods, with high prediction accuracy. To further verify the generalization ability of the proposed method, data from the National Computer Network Emergency Response Technical Team was selected for generalization verification. The results confirm that this method achieves optimal prediction performance. Furthermore, the proposed method addresses the insufficient prediction accuracy caused by the fixed input length of traditional models and data nonstationarity.

     

/

返回文章
返回