陈鹏, 白勇, 孙翰翔. 面向抓取检测的位姿估计数据集自动采集标注系统[J]. 工程科学学报. DOI: 10.13374/j.issn2095-9389.2023.09.28.001
引用本文: 陈鹏, 白勇, 孙翰翔. 面向抓取检测的位姿估计数据集自动采集标注系统[J]. 工程科学学报. DOI: 10.13374/j.issn2095-9389.2023.09.28.001
Automatic Data Collection and Annotation System for Pose Estimation Dataset Designed for Grasping Detection[J]. Chinese Journal of Engineering. DOI: 10.13374/j.issn2095-9389.2023.09.28.001
Citation: Automatic Data Collection and Annotation System for Pose Estimation Dataset Designed for Grasping Detection[J]. Chinese Journal of Engineering. DOI: 10.13374/j.issn2095-9389.2023.09.28.001

面向抓取检测的位姿估计数据集自动采集标注系统

Automatic Data Collection and Annotation System for Pose Estimation Dataset Designed for Grasping Detection

  • 摘要: 机器人抓取在物流分拣、自动装配、医疗手术等领域中具有广泛的应用。抓取检测是机器人抓取中的重要步骤之一,随着三维传感器的成本逐渐降低,抓取检测任务中越来越多的使用深度相机采集 RGB-D 图像数据,并采用基于位姿估计的方法实现机器人抓取。然而,目前已经公开的基于RGB-D 图像的位姿估计数据集,大多需要借助价格昂贵的三维扫描设备获得目标物体的三维模型,而且标注过程依赖人工操作,费时费力,不利于大规模数据集的制作。为此,本文设计并实现了一个面向位姿估计的数据集自动采集标注系统。该系统无需使用三维扫描设备,只通过采集、分析由深度相机获得的RGB-D图像序列即可重建出目标物体的三维模型,并自动标注目标物体的位姿信息,生成二维图像中的分割掩码。实验中,使用该系统制作了包含 84 个物体,共 8400 张 RGB-D 图像的位姿估计数据集,并将自动标注数据与手动标注数据进行了对比,发现两者分割掩码重合率可以达到 98%,并且自动标注的位姿信息能够使模型点云与全部场景点云对齐,充分说明了所提系统自动标注结果的准确性与可靠性。

     

    Abstract: Robotic grasping has extensive applications in fields such as logistics sorting, automated assembly, and medical surgery. Grasping detection is an important step in robotic grasping. In recent years, with the decrease in the cost of 3D sensors, more and more depth cameras are being used for grasping detection. Meanwhile, Pose estimation-based methods are employed in robotic grasping. However, most publicly available RGB-D image-based pose estimation datasets require the use of expensive 3D scanning devices to obtain the 3D models of target objects. Moreover, the annotation process relies on manual operation, which is time-consuming and labor-intensive, not conducive to the production of large-scale datasets. To address this issue, this paper implements a dataset automatic acquisition and annotation system for pose estimation, which does not require a 3D scanning device. It only needs to capture and analyze RGB-D image sequences obtained by a depth camera to reconstruct the 3D model of the target object and automatically annotate the pose information and 2D image segmentation mask. During the experiments, a dataset containing 84 objects and 8400 RGB-D images is created by the system. the automatically annotated data and manually annotated data is compared, revealing a segmentation mask overlap rate of 98%. Additionally, the automatically annotated pose information can be used to align the model point cloud with the entire scene point cloud, which demonstrates the accuracy and reliability of the proposed system.

     

/

返回文章
返回