In this paper, a multi-modal reinforcement learning neural network model is proposed to solve the quadruped robot motion planning problem. Current learning-based motion planning has made great progress in robotics applications currently, but most approaches still rely on domain randomization to train and eventually generalize to an intelligent body without external sensing that can adapt to challenging terrain. The main idea of this paper is that proprioceptive information such as encoders and inertial measurement units only provide contact measurements for immediate response, while intelligent bodies equipped with external sensors such as radar and cameras can learn to traverse environments with obstacles and uneven terrain autonomously through planning by predicting environmental changes many steps in advance. Therefore, this paper fully combines the quadruped robot proprioceptive information and on-board external sensor information, encodes each individual modal data to extract effective features, and then uses the attention mechanism of Transformer layer to fuse multimodal information to support the robot action decision, and uses the Actor-Critic reinforcement learning architecture for trial-and-error training in a simulation environment to finally provide quadruped robot motion with a safe motion strategy. Specifically, in terms of model design, we use a fully connected neural network to encode the ontology information vector, ConvNet to encode the image information, and PointNet to encode the point cloud information. We hope to minimize the information loss while improving the training and inference speed of the model to better meet the real-time and decision effectiveness of the task. In this paper, the proposed method is evaluated in a challenging simulation environment with different obstacles and uneven terrain for multiple sets of ablation experiments, and we observe that the method can effectively improve the algorithm's obstacle avoidance success rate. In addition to that, the proposed algorithm is still reliable in case of one or more sensor modal failures or unknown environment.