Named Entity Recognition of Chinese Electronic Medical Records Based on Multi-feature Embedding and Attention Mechanism
-
摘要: 中文电子病历文本包含大量嵌套实体、句子语法结构复杂、句式偏短。为有效识别其医疗实体,提出一种融合多特征嵌入与注意力机制的命名实体识别算法,在输入表示层融合字符、单词、字形三个粒度的特征,并在双向长短期记忆网络的隐含层引入注意力机制,使算法在捕获特征时更加关注于医疗实体相关的字符,最终实现对中文电子病历中疾病、身体部位、症状、药物、操作等五类实体的最优标注。面向开源和自建糖尿病数据集的实验结果中所提算法的实体识别准确率、召回率和F1值都达到97%以上,表明其可以更加有效地识别中文电子病历中各类实体。Abstract: In Chinese electronic medical records, the sentences are short and have the complex grammatical structure. In order to effectively recognize the medical entities, a novel named entity recognition based on multi-feature embedding and attention mechanism is proposed. After embedding three kinds of features derived from characters, words, and glyphs in the input presentation layer, the attention machine is introduced to the hidden layer of the bidirectional long short-term memory network, with the purpose of making the model focusing on the characters related to the medical entities. Finally, the optimal labels for five types of entities in Chinese electronic medical records, including diseases, body parts, symptoms, drugs, and operations are obtained. The experimental results for the open and self-built Chinese electronic medical records, the recognition accuracy, the recall rate and the F1 value of the proposed algorithm are all better than 97%, which shows that the proposed algorithm can more effectively identify various entities in Chinese electronic medical records.
-

计量
- 文章访问数: 132
- HTML全文浏览量: 85
- 被引次数: 0