Abstract:
As research on 6G continues to advance, immersive communication has been explicitly identified in ITU-R M.2160 as one of the core application scenarios for IMT-2030, while 3GPP has continued to refine XR-oriented protocol mechanisms in Releases 18 and 19, indicating that the service paradigm of future mobile communication systems is evolving from single-media transmission toward multi-entity collaborative interaction. To address the fragmentation of existing studies and the lack of a unified analytical framework, this paper proposes a deeply integrated end-edge-cloud-network-intelligence system architecture and develops a cross-domain collaborative framework encompassing four categories of resources: communication, computing, rendering, and inference. In addition, this paper presents the conceptual connotation and research scope of immersive agent communications and, on this basis, compares the end, edge, cloud, network, and intelligence architecture with existing standard architectures, with a particular focus on key mechanisms including multimodal data alignment, partitioning of decoupled rendering tasks and computation offloading, as well as intent-driven distributed orchestration of intelligent agents. Furthermore, this paper establishes a preliminary general model for cross-domain joint resource optimization and identifies open challenges in task-aware evaluation, real-time decision stability, and intrinsic security. In summary, the key to immersive agent communications lies not in the localized improvement of individual link performance, but in the systematic organization of the entire service workflow and the deep integration of cross-domain resources.