DRL驱动的多模态语义通信与URLLC共存网络动态资源分配

DRL-driven multi-modal semantic communication and URLLC coexistence network dynamic resource allocation

  • 摘要: 针对下一代无线网络中多模态语义通信与超可靠低延迟通信(URLLC)业务在下行正交频分多址(OFDMA)系统中并存导致的频谱资源竞争问题,本文研究了如何在满足URLLC严格时延约束的同时,最大化语义用户的体验质量(QoE)。提出了一种基于深度强化学习(DRL)的重要性感知的动态资源打孔方案。首先,为了解决传统比特级资源分配无法适配语义任务特性的问题,引入基于集成梯度(Integrated Gradients)的归因分析方法,建立细粒度的语义到物理资源映射模型,量化语义特征对下游智能任务精度的贡献度,并构建语义特征重要性权重矩阵。其次,建立语义通信与URLLC共存模型,将复杂的非凸组合优化问题解耦为信道分配与动态打孔两个子问题:在信道分配阶段,基于语义用户速率需求采用注水算法进行预分配;在打孔阶段,将问题建模为马尔可夫决策过程(MDP),设计了基于近端策略优化(PPO)算法的智能体。该智能体根据实时语义特征重要性、剩余允许打孔次数及URLLC队列状态,动态决策URLLC数据包的传输位置,在避免破坏关键语义信息与保障URLLC时延之间寻找最优策略。仿真结果表明,与随机打孔及贪心策略相比,所提算法能够精确识别并避开高权重语义资源块,在不同URLLC流量强度及语义压缩比下,均能保持最低的语义中断百分比和较高的平均总奖励,实现了异构业务间资源的高效动态调配。

     

    Abstract: Aiming at the spectrum resource competition caused by the coexistence of multi-modal semantic communication and Ultra-Reliable and Low-Latency Communications (URLLC) services in downlink Orthogonal Frequency Division Multiple Access (OFDMA) systems for next-generation wireless networks, this paper investigates how to maximize the Quality of Experience (QoE) of semantic users while satisfying the strict latency constraints of URLLC. An importance-aware dynamic resource puncturing scheme based on Deep Reinforcement Learning (DRL) is proposed. First, to address the inability of traditional bit-level resource allocation to adapt to semantic task characteristics, an attribution analysis method based on Integrated Gradients is introduced. This establishes a fine-grained semantic-to-physical resource mapping model, quantifies the contribution of semantic features to the accuracy of downstream intelligent tasks, and constructs a semantic feature importance weight matrix. Second, a coexistence model for semantic communication and URLLC is established, decoupling the complex non-convex combinatorial optimization problem into two sub-problems: channel allocation and dynamic puncturing. In the channel allocation stage, a water-filling algorithm is adopted for pre-allocation based on semantic user rate requirements. In the puncturing stage, the problem is modeled as a Markov Decision Process (MDP), and an agent based on the Proximal Policy Optimization (PPO) algorithm is designed. Based on real-time semantic feature importance, remaining allowable puncture counts, and URLLC queue status, the agent dynamically determines the transmission positions of URLLC packets, seeking an optimal strategy to balance avoiding the destruction of critical semantic information and guaranteeing URLLC latency. Simulation results demonstrate that, compared with random puncturing and greedy strategies, the proposed algorithm accurately identifies and avoids high-weight semantic resource blocks. Under varying URLLC traffic intensities and semantic compression ratios, it maintains the lowest semantic interruption percentage and a high average total reward, achieving efficient dynamic resource allocation between heterogeneous services.

     

/

返回文章
返回