Abstract:
In the realm of industrial Internet of Things (IoT) energy management, the allocation of spatio-temporal crowdsourcing resources to Unmanned Aerial Vehicles (UAVs) represents a significant challenge. Traditional approaches to this problem have focused on optimizing the Age of Information (AoI) to ensure timely and equitable data updates. Nonetheless, these methods often overlook critical operational constraints such as UAV no-fly zones and the potential for data interception by eavesdroppers, both of which can have a detrimental effect on the freshness and integrity of the information being gathered and transmitted. To address these shortcomings, this paper introduces a novel deep reinforcement learning-based framework for UAV spatio-temporal crowdsourcing resource allocation. Our approach specifically aims to minimize the average AoI across the network while also reducing the energy consumption of IoT devices. This is achieved by incorporating the spatial constraints imposed by UAV no-fly zones and by actively managing the transmission of jamming signals to mitigate the threat posed by eavesdroppers, thus ensuring the security of the data. The problem is formalized as a Markov Decision Process (MDP), which provides a structured approach to model the decision-making scenario faced by UAVs in a dynamic environment. To solve this MDP, we employ the Soft Actor-Critic (SAC) algorithm, an advanced deep reinforcement learning method known for its sample efficiency and stability. The SAC algorithm is adept at handling the continuous action spaces typical of UAV flight paths and power control problems, making it particularly well-suited for our application. We rigorously test our proposed method in scenarios involving multiple UAVs, demonstrating not only the algorithm's ability to effectively manage the spatio-temporal allocation of resources but also its superiority in maintaining data freshness and security over existing state-of-the-art methods such as the Twin Delayed Deep Deterministic Policy Gradient (TD3) and the Deep Deterministic Policy Gradient (DDPG) algorithms. Furthermore, the paper delves into the strategic selection of the optimal number of UAVs to balance the trade-offs between coverage, energy consumption, and operational efficiency. By analytically and empirically examining the impact of the UAV fleet size on the system's performance, we provide insights into how to configure UAV networks to achieve the best possible outcomes in terms of AoI, energy management, and security. In conclusion, our research contributes a robust and intelligent framework for UAV resource allocation. The demonstrated efficacy of the SAC algorithm in this context paves the way for its future application in other domains where secure, efficient, and intelligent resource management is paramount.