• 《工程索引》(EI)刊源期刊
  • 中文核心期刊(综合性理工农医类)
  • 中国科技论文统计源期刊
  • 中国科学引文数据库来源期刊

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

融合多特征嵌入与注意力机制的中文电子病历命名实体识别

巩敦卫 张永凯 郭一楠 王斌 樊宽鲁 火焱

巩敦卫, 张永凯, 郭一楠, 王斌, 樊宽鲁, 火焱. 融合多特征嵌入与注意力机制的中文电子病历命名实体识别[J]. 工程科学学报. doi: 10.13374/j.issn2095-9389.2021.01.12.006
引用本文: 巩敦卫, 张永凯, 郭一楠, 王斌, 樊宽鲁, 火焱. 融合多特征嵌入与注意力机制的中文电子病历命名实体识别[J]. 工程科学学报. doi: 10.13374/j.issn2095-9389.2021.01.12.006
GONG Dun-wei, ZHANG Yong-kai, GUO Yi-nan, WANG Bin, FAN Kuan-lu, HUO Yan. Named entity recognition of Chinese electronic medical records based on multifeature embedding and attention mechanism[J]. Chinese Journal of Engineering. doi: 10.13374/j.issn2095-9389.2021.01.12.006
Citation: GONG Dun-wei, ZHANG Yong-kai, GUO Yi-nan, WANG Bin, FAN Kuan-lu, HUO Yan. Named entity recognition of Chinese electronic medical records based on multifeature embedding and attention mechanism[J]. Chinese Journal of Engineering. doi: 10.13374/j.issn2095-9389.2021.01.12.006

融合多特征嵌入与注意力机制的中文电子病历命名实体识别

doi: 10.13374/j.issn2095-9389.2021.01.12.006
基金项目: 国家自然科学基金资助项目(61973305,61773384);中国矿业大学中央高校基本科研业务费专项资金资助项目(2020ZDPY0302)
详细信息
    通讯作者:

    E-mail:nanfly@126.com

  • 中图分类号: TP391.1

Named entity recognition of Chinese electronic medical records based on multifeature embedding and attention mechanism

More Information
  • 摘要: 中文电子病历文本包含大量嵌套实体、句子语法结构复杂、句式偏短。为有效识别其医疗实体,提出一种融合多特征嵌入与注意力机制的命名实体识别算法,在输入表示层融合字符、单词、字形三个粒度的特征,并在双向长短期记忆网络的隐含层引入注意力机制,使算法在捕获特征时更加关注于医疗实体相关的字符,最终实现对中文电子病历中疾病、身体部位、症状、药物、操作五类实体的最优标注。面向开源和自建糖尿病数据集的实验结果中所提算法的实体识别准确率、召回率和F1值都达到97%以上,表明其可以更加有效地识别中文电子病历中各类实体。

     

  • 图  1  MFBAC算法框架

    Figure  1.  MFBAC framework

    图  2  不同算法的F1值

    Figure  2.  Comparison on the F1 values of different NER models

    表  1  命名实体类别

    Table  1.   Types of named entities

    The entity classIdentifierDefinition of categories
    DiseasesB-diseases I-diseasesTerms of various diseases
    SymptomB-symptom I-symptomAbnormal physical manifestations
    BodyB-body I-bodyVarious parts of the human body
    DrugB-drug I-drugThe names of various medicines
    TestB-test I-testVarious physical examinations
    下载: 导出CSV

    表  2  训练集与测试集医疗实体分布

    Table  2.   Distribution of training and test datasets for medical entities

    DatasetTraining dataTest data
    Diseases856382
    Symptom38451526
    Body563214
    Drug657289
    Test34261647
    Total93474058
    下载: 导出CSV

    表  3  不同特征嵌入下的命名实体识别性能

    Table  3.   Performance of NER embedding different features

    ModelP/%R/%F1/%
    Font embedding-BiLSTM-CRF79.5180.3579.72
    Char embedding-BiLSTM-CRF88.6187.4387.96
    Word embedding-BiLSTM-CRF85.8286.8786.32
    CW embedding-BiLSTM-CRF86.5887.2387.62
    CWF embedding-BiLSTM-CRF96.2497.2596.94
    下载: 导出CSV

    表  4  注意力机制对不同特征嵌入的影响

    Table  4.   Performance of NER with attention

    ModelP/%R/%F1/%
    Font embedding-BiLSTM-Att-CRF92.4693.1292.68
    Char embedding-BiLSTM-Att-CRF93.4193.5693.49
    Word embedding-BiLSTM-Att-CRF96.3696.1896.21
    CW embedding -BiLSTM-Att-CRF96.5296.1896.45
    CWF embedding -BiLSTM-Att-CRF97.2197.8397.54
    下载: 导出CSV

    表  5  不同算法的性能对比

    Table  5.   Comparison of the performance of different NER models

    ModelP/
    %
    R/
    %
    F1/
    %
    Loading
    time/s
    Testing
    time/s
    Transformer85.4686.3285.684.3312.6
    BiGRU-CRF85.8786.2386.142.959.4
    BiLSTM-CRF88.6187.4395.163.219.81
    Attention-BiLSTM-CRF94.5296.1896.453.5610.56
    Transformer-CRF95.3294.6294.145.3213.57
    MFBAC97.2197.8397.544.3411.68
    下载: 导出CSV
  • [1] Tang G Q, Gao D Q, Ruan T, et al. Clinical electronic medical record named entity recognition incorporating language model. Comput Sci, 2020, 47(3): 211 doi: 10.11896/jsjkx.190200259

    唐国强, 高大启, 阮彤, 等. 融入语言模型和注意力机制的临床电子病历命名实体识别. 计算机科学, 2020, 47(3):211 doi: 10.11896/jsjkx.190200259
    [2] Topol E J. High-performance medicine: The convergence of human and artificial intelligence. Nat Med, 2019, 25(1): 44 doi: 10.1038/s41591-018-0300-7
    [3] He J, Baxter S L, Xu J, et al. The practical implementation of artificial intelligence technologies in medicine. Nat Med, 2019, 25(1): 30 doi: 10.1038/s41591-018-0307-0
    [4] Li B, Kang X D, Zhang H L, et al. Named entity recognition in Chinese electronic medical records using transformer-CRF. Comput Eng Appl, 2020, 56(5): 153 doi: 10.3778/j.issn.1002-8331.1909-0211

    李博, 康晓东, 张华丽, 等. 采用Transformer-CRF的中文电子病历命名实体识别. 计算机工程与应用, 2020, 56(5):153 doi: 10.3778/j.issn.1002-8331.1909-0211
    [5] Luo L, Yang Z H, Yang P, et al. An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition. Bioinformatics, 2018, 34(8): 1381 doi: 10.1093/bioinformatics/btx761
    [6] Xu K, Yang Z G, Kang P P, et al. Document-level attention-based BiLSTM-CRF incorporating disease dictionary for disease named entity recognition. Comput Biol Med, 2019, 108: 122 doi: 10.1016/j.compbiomed.2019.04.002
    [7] Yang J F, Yu Q B, Guan Y, et al. An overview of research on electronic medical record oriented named entity recognition and entity relation extraction. Acta Autom Sin, 2014, 40(8): 1537

    杨锦锋, 于秋滨, 关毅, 等. 电子病历命名实体识别和实体关系抽取研究综述. 自动化学报, 2014, 40(8):1537
    [8] Lei J, Tang B, Lu X, et al. A comprehensive study of named entity recognition in Chinese clinical text. J Am Med Inform Assoc, 2014, 21(5): 808 doi: 10.1136/amiajnl-2013-002381
    [9] Hirschberg J, Manning C D. Advances in natural language processing. Science, 2015, 349(6245): 261 doi: 10.1126/science.aaa8685
    [10] Wang Q, Zhou Y M, Ruan T, et al. Incorporating dictionaries into deep neural networks for the Chinese clinical named entity recognition. J Biomed Informatics, 2019, 92: 103133 doi: 10.1016/j.jbi.2019.103133
    [11] Shang J B, Liu L Y, Gu X T, et al. Learning named entity tagger using domain-specific dictionary//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, 2018: 2054
    [12] Kraus S, Blake C, West S L. Information extraction from medical notes [J/OL]. arXiv preprint (2007-07-24) [2020-12-26]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.120.3671&rep=rep1&type=pdf.
    [13] Gorinski P J, Wu H H, Grover C, et al. Named entity recognition for electronic health records: A comparison of rule-based and machine learning approaches [J/OL]. arXiv preprint (2019-04-25) [2020-12-26]. https://arxiv.org/pdf/1903.03985.pdf.
    [14] Ma X Z, Hovy E. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF [J/OL]. arXiv preprint (2016-05-29) [2020-12-26]. https://arxiv.org/pdf/1603.01354.pdf.
    [15] Zhang Y, Yang J. Chinese NER Using Lattice LSTM [J/OL]. arXiv preprint (2018-07-05) [2020-12-26]. https://arxiv.org/pdf/1805.02023.pdf.
    [16] Alsentzer E, Murphy J R, Boag W, et al. Publicly available clinical BERT embeddings [J/OL]. arXiv preprint (2019-6-20) [2020-12-26]. https://arxiv.org/pdf/1904.03323.pdf.
    [17] Jiang M, Chen Y K, Liu M, et al. A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. J Am Med Inform Assoc, 2011, 18(5): 601 doi: 10.1136/amiajnl-2011-000163
    [18] Wei Q K, Chen T, Xu R F, et al. Disease named entity recognition by combining conditional random fields and bidirectional recurrent neural networks. Database (Oxford), 2016, 140: 1
    [19] Gong L J, Zhang Z F. Clinical named entity recognition from Chinese electronic medical records using a double-layer annotation model combining a domain dictionary with CRF. Chin J Eng, 2020, 42(4): 469

    龚乐君, 张知菲. 基于领域词典与CRF双层标注的中文电子病历实体识别. 工程科学学报, 2020, 42(4):469
    [20] Hu J L, Shi X, Liu Z J, et al.HITSZ_CNER: a hybrid system for entity recognition from Chinese clinical text//Proceedings of the Evaluation Tasks at the China Conference on Knowledge Graph and Semantic Computing (CCKS 2017). Chengdu, 2017: 1
    [21] Mikolov T, Grave E, Bojanowski P, et al. Advances in pre-training distributed word representations [J/OL]. arXiv preprint (2017-12-26) [2020-12-26]. https://arxiv.org/pdf/1712.09405.pdf.
    [22] Pennington J, Socher R, Manning C. GloVe: global vectors for word representation//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, 2014: 1532
    [23] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need [J/OL]. arXiv preprint (2017-12-06) [2020-12-26]. https://arxiv.org/pdf/1706.03762.pdf.
    [24] Choi E, Bahadori M T, Kulas J A, et al. RETAIN: interpretable predictive model in healthcare using reverse time attention mechanism [J/OL]. arXiv preprint (2016-08-19) [2020-12-26]. https://arxiv.org/pdf/1608.05745.pdf.
    [25] Zhu Q L, Li X L, Conesa A, et al. GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text. Bioinformatics, 2018, 34(9): 1547 doi: 10.1093/bioinformatics/btx815
    [26] Wu G H, Tang G G, Wang Z R, et al. An attention-based BiLSTM-CRF model for Chinese clinic named entity recognition. IEEE Access, 2019, 7: 113942 doi: 10.1109/ACCESS.2019.2935223
  • 加载中
图(2) / 表(5)
计量
  • 文章访问数:  216
  • HTML全文浏览量:  156
  • PDF下载量:  54
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-01-12
  • 网络出版日期:  2021-03-02

目录

    /

    返回文章
    返回