刘娅汐, 李旭龙, 霍佳皓, 皇甫伟. 基于深度强化学习的无人机时空众包资源分配[J]. 工程科学学报. DOI: 10.13374/j.issn2095-9389.2024.06.01.001
引用本文: 刘娅汐, 李旭龙, 霍佳皓, 皇甫伟. 基于深度强化学习的无人机时空众包资源分配[J]. 工程科学学报. DOI: 10.13374/j.issn2095-9389.2024.06.01.001
UAV spatio-temporal crowdsourcing resource allocation based on deep reinforcement learning[J]. Chinese Journal of Engineering. DOI: 10.13374/j.issn2095-9389.2024.06.01.001
Citation: UAV spatio-temporal crowdsourcing resource allocation based on deep reinforcement learning[J]. Chinese Journal of Engineering. DOI: 10.13374/j.issn2095-9389.2024.06.01.001

基于深度强化学习的无人机时空众包资源分配

UAV spatio-temporal crowdsourcing resource allocation based on deep reinforcement learning

  • 摘要: 无人机时空众包资源分配是工业物联网能源管理中的重要任务之一。尽管现有方法考虑了联合反映时间敏感性和公平性的信息新鲜度指标,但忽略了无人机禁飞区和窃听者对数据新鲜度的影响。本文提出了一种基于深度强化学习的无人机时空众包资源分配方法,在考虑无人机禁飞区约束和对窃听者发送干扰信号以保障数据安全的情况下,最小化平均信息新鲜度和物联网设备能耗,从而得到最优无人机轨迹、发射干扰信号功率和物联网发射功率。本文将该问题建模为马尔可夫决策过程并使用先进的深度强化学习算法求解该问题,即软演员-评论家(SAC)算法。本文在多无人机场景下验证了所提出算法在解决无人机时空众包资源分配任务中的有效性和正确性。另外,SAC算法优于另外两种先进的深度强化学习算法,即深度确定性策略梯度算法和双延迟深度确定性策略梯度算法。最后,本文分析了最优无人机数目的选择方案。

     

    Abstract: In the realm of industrial Internet of Things (IoT) energy management, the allocation of spatio-temporal crowdsourcing resources to Unmanned Aerial Vehicles (UAVs) represents a significant challenge. Traditional approaches to this problem have focused on optimizing the Age of Information (AoI) to ensure timely and equitable data updates. Nonetheless, these methods often overlook critical operational constraints such as UAV no-fly zones and the potential for data interception by eavesdroppers, both of which can have a detrimental effect on the freshness and integrity of the information being gathered and transmitted. To address these shortcomings, this paper introduces a novel deep reinforcement learning-based framework for UAV spatio-temporal crowdsourcing resource allocation. Our approach specifically aims to minimize the average AoI across the network while also reducing the energy consumption of IoT devices. This is achieved by incorporating the spatial constraints imposed by UAV no-fly zones and by actively managing the transmission of jamming signals to mitigate the threat posed by eavesdroppers, thus ensuring the security of the data. The problem is formalized as a Markov Decision Process (MDP), which provides a structured approach to model the decision-making scenario faced by UAVs in a dynamic environment. To solve this MDP, we employ the Soft Actor-Critic (SAC) algorithm, an advanced deep reinforcement learning method known for its sample efficiency and stability. The SAC algorithm is adept at handling the continuous action spaces typical of UAV flight paths and power control problems, making it particularly well-suited for our application. We rigorously test our proposed method in scenarios involving multiple UAVs, demonstrating not only the algorithm's ability to effectively manage the spatio-temporal allocation of resources but also its superiority in maintaining data freshness and security over existing state-of-the-art methods such as the Twin Delayed Deep Deterministic Policy Gradient (TD3) and the Deep Deterministic Policy Gradient (DDPG) algorithms. Furthermore, the paper delves into the strategic selection of the optimal number of UAVs to balance the trade-offs between coverage, energy consumption, and operational efficiency. By analytically and empirically examining the impact of the UAV fleet size on the system's performance, we provide insights into how to configure UAV networks to achieve the best possible outcomes in terms of AoI, energy management, and security. In conclusion, our research contributes a robust and intelligent framework for UAV resource allocation. The demonstrated efficacy of the SAC algorithm in this context paves the way for its future application in other domains where secure, efficient, and intelligent resource management is paramount.

     

/

返回文章
返回