Abstract:
In order to solve the problem of multi-scene visual perception under complex and varied lighting conditions, a scene adaptive granularity progressive multi-modal image fusion method is proposed in this paper. In this method, an image encoder is designed to encode the scene information into the fusion network and guide the network to generate different styles of fusion images according to the different scene information. A feature extraction module based on state space equation is designed to improve the feature representation ability of the network and realize global feature perception with linear complexity. A granular progressive fusion module for global refinement fusion of multi-modal features is designed, and a cross-modal coordinate attention mechanism is constructed to fine-tune the multi-modal features by serializing multi-modal features. At the same time, the prior knowledge is used to generate enhanced images as labels, and the homologous and heterogeneous losses are constructed according to different environments to achieve scene adaptive multimodal image fusion. The experimental results show that the proposed method is compared with 10 advanced algorithms on MSRS, TNO and LLVIP three public data sets, and better visual effects and quantitative indicators are obtained.