基于Zero-Shot-CoT的对话价值观优先级标注方法

A Method for Annotating Dialogue Value Priority Based on Zero-Shot Chain-of-Thought

  • 摘要: 价值观优先级识别旨在识别文本背后隐含的价值观优先级属性,从而判断其是否与特定的价值观及其类型相符,对于用户语言检测、评估大语言模型生成内容和探究大语言模型对人类价值观优先级的评估能力至关重要。目前,由于缺乏对话场景下的人类价值观识别数据集,在对话中建模并识别人类价值观优先级的研究仍未被触及。因此,构建高质量的对话价值观优先级识别数据集是首要任务。然而,标注对话价值观优先级识别数据集要求标注者具备一定专业知识储备,标注门槛较高,因此,本文基于大语言模型对现有的对话语料进行标注,提供了一个对话价值观优先级识别数据集的标注案例,扩展了基于大语言模型的数据标注的应用。具体来说,我们设计了一种基于Zero-Shot-CoT的对话价值观标注方法,模拟了人类标注结果,并通过本文提出的对话价值观优先级标注方法,构建了一个大规模对话价值观识别数据集ValueCon。有效性实验结果表明,与人工标注的数据集相比,ValueCon数据集能够更有效的训练并提升模型性能。

     

    Abstract: Value priority identification aims to uncover the implicit value priority attributes underlying a text, determining whether they align with specific values and their categories. This task is critical for detecting user language, evaluating content generated by large language models (LLMs), and exploring the ability of LLMs to assess human value priorities. However, due to the lack of datasets for human value priority identification in dialogue scenarios, research on modeling and identifying such priorities in conversations remains unexplored. Consequently, constructing a high-quality dataset for value priority identification in dialogues has become a pressing need.The creation of such a dataset poses significant challenges, as it requires annotators to possess substantial domain expertise, resulting in high annotation barriers. To address this issue, this study employs LLMs to annotate existing dialogue data, providing an annotated example of a value priority identification dataset in dialogues. This approach extends the application of LLMs in data annotation.Specifically, we propose a novel annotation methodology for dialogue value priority identification based on a Zero-Shot Chain-of-Thought approach, simulating human annotation results. Using this methodology, we construct a large-scale dialogue value identification dataset, ValueCon. Experimental results demonstrate the effectiveness of the proposed annotation method, as the ValueCon dataset outperforms manually annotated datasets in training and enhancing model performance.

     

/

返回文章
返回