融合改进LSTM与XGBoost的可解释性手足口病发病预测模型

An Interpretable Prediction Model for Hand-Foot-and-Mouth Disease Incidence Based on Improved LSTM and XGBoost

  • 摘要: 针对现有手足口病发病预测模型的准确率较低且可解释性较差的问题,综合多种气象因素,提出一种基于长短时记忆网络(Long Short-Term Memory, LSTM)、极度梯度提升树(eXtreme Gradient Boosting, XGBoost)、灰狼优化算法(Grey Wolf Optimizer,GWO)、遗传算法(Genetic Algorithm,GA)和沙普利加和解释(SHapley Additive exPlanations, SHAP)的可解释性预测模型GWO-LSTM-GA-XGBoost。首先,使用GWO算法对LSTM算法中的关键参数进行自适应寻优;其次,利用GA算法的全局搜索能力,对XGBoost算法的参数进行优化,弥补XGBoost收敛较慢的缺陷;然后,使用误差倒数法对改进的LSTM与XGBoost算法进行融合,以提升模型的预测准确度;最后,使用SHAP方法对该模型的特征重要性进行归因和可解释性分析。基于南方某城市2014-2019年手足口病日发病数及气象监测数据,对模型进行了手足口病发病数预测的对比实验,结果表明,相比于现有机器学习预测模型,该模型具有更高的预测准确率,能够准确的预测手足口病发病数以及高效的发现手足口病患病的潜在特征。

     

    Abstract: As global climate warming intensifies, climate change has impacted every aspect of the occurrence, transmission, and variation of infectious diseases. The adverse effects of weather-related infectious diseases on human health have gradually become a major concern for the public. To promptly implement preventive and timely intervention measures against hand-foot-and-mouth disease (HFMD), accurate and reliable forecasting of daily HFMD cases is imperative. Addressing the issues of low accuracy and poor interpretability in existing HFMD incidence prediction models, this study proposes an interpretable prediction model, GWO-LSTM-GA-XGBoost, integrating multiple meteorological factors with Long Short-Term Memory (LSTM), eXtreme Gradient Boosting (XGBoost), Grey Wolf Optimizer (GWO), Genetic Algorithm (GA), and SHapley Additive exPlanations (SHAP). Initially, missing values in the data are imputed, and key meteorological factors influencing HFMD incidence are identified through grey relational analysis. Subsequently, a model is constructed to capture the relationship between HFMD incidence, meteorological conditions, and temporal factors. The GWO algorithm is employed to adaptively optimize the key parameters in the LSTM algorithm. Then, leveraging the global search capability of the GA algorithm, the parameters of the XGBoost algorithm are optimized to compensate for its slow convergence. Following this, the improved LSTM and XGBoost algorithms are fused using the reciprocal error method to enhance the model's prediction accuracy. Finally, SHAP is utilized to attribute and analyze the feature importance of the model for interpretability. Based on daily HFMD incidence and meteorological monitoring data from a southern city between 2014 and 2019, comparative experiments were conducted to evaluate the model's performance in predicting HFMD incidence. The results demonstrate that compared to other machine learning prediction models, the proposed model achieves higher prediction accuracy, enabling precise forecasting of HFMD incidence and efficient identification of potential features associated with HFMD. Notably, temperature emerges as the most critical factor influencing HFMD incidence.

     

/

返回文章
返回