高嘉, 蔡文浩, 赵俊莉, 段福庆. ViTAU:基于Vision Transformer和面部动作单元的面瘫识别与分析[J]. 工程科学学报. DOI: 10.13374/j.issn2095-9389.2024.05.06.003
引用本文: 高嘉, 蔡文浩, 赵俊莉, 段福庆. ViTAU:基于Vision Transformer和面部动作单元的面瘫识别与分析[J]. 工程科学学报. DOI: 10.13374/j.issn2095-9389.2024.05.06.003
ViTAU: Facial Paralysis Recognition and Analysis Based on Vision Transformer and Facial Action Units[J]. Chinese Journal of Engineering. DOI: 10.13374/j.issn2095-9389.2024.05.06.003
Citation: ViTAU: Facial Paralysis Recognition and Analysis Based on Vision Transformer and Facial Action Units[J]. Chinese Journal of Engineering. DOI: 10.13374/j.issn2095-9389.2024.05.06.003

ViTAU:基于Vision Transformer和面部动作单元的面瘫识别与分析

ViTAU: Facial Paralysis Recognition and Analysis Based on Vision Transformer and Facial Action Units

  • 摘要: 面部神经麻痹(Facial Nerve Paralysis,FNP),通常称为贝尔氏麻痹或面瘫,对患者的日常生活和心理健康产生显著影响,面瘫的及时识别和诊断对于患者的早期治疗和康复至关重要。随着深度学习和计算机视觉技术的快速发展,面瘫的自动识别变得可行,为诊断提供了一种更准确和客观的方式。目前的研究主要集中关注面部的整体变化,而忽略了面部细节的重要性。面部不同部位对识别结果的影响力并不相同,这些研究尚未对面部各个区域进行细致区分和分析。本项研究引入结合Vision Transformer(ViT)模型和动作单元(Action Unit,AU)区域检测网络的创新性方法用于面瘫的自动识别及区域分析。ViT模型通过自注意力机制精准识别是否面瘫,同时,基于AU的策略从StyleGAN2模型提取的特征图中,利用金字塔卷积神经网络分析受影响区域。这一综合方法在YouTube Facial Palsy(YFP)和经过扩展的Cohn Kanade(CK+)数据集上的实验中分别达到99.4%的面瘫识别准确率和81.36%的面瘫区域识别准确率。通过与最新方法的对比,实验结果展示了所提的自动面瘫识别方法的有效性。

     

    Abstract: Facial Nerve Paralysis (FNP), commonly known as Bell's palsy or facial paralysis, significantly impacts patients' daily life and mental health. Timely identification and diagnosis of facial paralysis are crucial for early treatment and recovery. With the rapid development of deep learning and computer vision technologies, automatic recognition of facial paralysis has become feasible, providing a more accurate and objective method for diagnosis. Current research primarily focuses on overall facial changes, neglecting the importance of facial details. The influence of different facial regions on recognition results varies, and these studies have yet to meticulously differentiate and analyze each facial area. This research introduces an innovative method that combines the Vision Transformer (ViT) model and an Action Unit (AU) region detection network for the automatic recognition and regional analysis of facial paralysis. The ViT model accurately identifies facial paralysis through its self-attention mechanism, while the AU-based strategy uses features extracted from the StyleGAN2 model and analyzes affected areas using a pyramid convolutional neural network. This comprehensive approach achieved a 99.4% accuracy rate in facial paralysis recognition and an 81.36% accuracy rate in facial paralysis region recognition in experiments on the YouTube Facial Palsy (YFP) and the extended Cohn Kanade (CK+) datasets. Experimental results demonstrate the effectiveness of the proposed automatic facial paralysis recognition method compared to the latest techniques.

     

/

返回文章
返回