刘建伟, 刘俊文, 罗雄麟. 深度学习中注意力机制研究进展[J]. 工程科学学报, 2021, 43(11): 1499-1511. DOI: 10.13374/j.issn2095-9389.2021.01.30.005
引用本文: 刘建伟, 刘俊文, 罗雄麟. 深度学习中注意力机制研究进展[J]. 工程科学学报, 2021, 43(11): 1499-1511. DOI: 10.13374/j.issn2095-9389.2021.01.30.005
LIU Jian-wei, LIU Jun-wen, LUO Xiong-lin. Research progress in attention mechanism in deep learning[J]. Chinese Journal of Engineering, 2021, 43(11): 1499-1511. DOI: 10.13374/j.issn2095-9389.2021.01.30.005
Citation: LIU Jian-wei, LIU Jun-wen, LUO Xiong-lin. Research progress in attention mechanism in deep learning[J]. Chinese Journal of Engineering, 2021, 43(11): 1499-1511. DOI: 10.13374/j.issn2095-9389.2021.01.30.005

深度学习中注意力机制研究进展

Research progress in attention mechanism in deep learning

  • 摘要: 对注意力机制的主流模型进行了全面系统的概述。注意力机制模拟人类视觉选择性的机制,其核心的目的是从冗杂的信息中选择出对当前任务目标关联性更大、更关键的信息而过滤噪声,也就是高效率信息选择和关注机制。首先简要介绍和定义了注意力机制的原型,接着按照多个层面对各种注意力机制结构进行分类,然后对注意力机制的可解释性进行了阐述同时总结了在各种领域的应用,最后指出了注意力机制未来的发展方向以及会面临的挑战。

     

    Abstract: There are two challenges with the traditional encoder–decoder framework. First, the encoder needs to compress all the necessary information of a source sentence into a fixed-length vector. Second, it is unable to model the alignment between the source and the target sentences, which is an essential aspect of structured output tasks, such as machine translation. To address these issues, the attention mechanism is introduced to the encoder–decoder model. This mechanism allows the model to align and translate by jointly learning a neural machine translation task. The whose core idea of this mechanism is to induce attention weights over the source sentences to prioritize the set of positions where relevant information is present for generating the next output token. Nowadays, this mechanism has become essential in neural networks, which have been researched for diverse applications. The present survey provides a systematic and comprehensive overview of the developments in attention modeling. The intuition behind attention modeling can be best explained by the simulation mechanism of human visual selectivity, which aims to select more relevant and critical information from tedious information for the current target task while ignoring other irrelevant information in a manner that assists in developing perception. In addition, attention mechanism is an efficient information selection and widely used in deep learning fields in recent years and played a pivotal role in natural language processing, speech recognition, and computer vision. This survey first briefly introduces the origin of the attention mechanism and defines a standard parametric and uniform model for encoder–decoder neural machine translation. Next, various techniques are grouped into coherent categories using types of alignment scores and number of sequences, abstraction levels, positions, and representations. A visual explanation of attention mechanism is then provided to a certain extent, and roles of attention mechanism in multiple application areas is summarized. Finally, this survey identified the future direction and challenges of the attention mechanism.

     

/

返回文章
返回