视频中基于时序空洞卷积的3D人体姿态估计
首发时间:2021-03-24
摘要:3D人体姿态估计是计算机视觉领域中的热点研究问题,本文主要研究的是基于视频的3D人体姿态估计,为进一步提高识别准确率,本文中提出了一种基于时序卷积的3D人体姿态估计算法。首先,使用先进的目标检测算法和2D人体姿态估计算法检测出视频帧中人体的2D关节位置信息。然后,使用本文中所提出的基于时序卷积的2D-3D姿态提升网络将2D关节位置信息提升至3D空间。与传统的基于视频中单帧图像的3D人体姿态估计算法相比,本文中所提出的模型能够充分利用视频中的时序信息来改善模型识别效果,并解决传统方法中存在的姿态估计结果不连续的问题。通过实验发现本文所提出的模型在Human3.6M数据集上达到了最佳的性能,其姿态估计误差要远低于现有研究中未使用时序信息进行3D人体姿态估计的方法和使用RNN网络对时序信息进行建模的3D人体姿态估计方法。该实验结果验证了本文中所提出的基于时序卷积的3D人体姿态估计算法的有效性和先进性。
For information in English, please click here
3D Human Pose Estimation Based on Temporal Dilated Convolution in the Video
Abstract:3D human pose estimation is a hot topic in computer vision. This paper mainly studies the 3D human pose estimation based on video. In order to further improve the recognition accuracy, a 3D human pose estimation algorithm based on temporal convolution is proposed. Firstly, advanced object detection algorithm and 2D human pose estimation algorithm are used to detect the 2D joint position of human body in video frames. Then, the 2D-3D pose lifting network based on temporal convolution proposed in this paper is used to transform 2D human pose into 3D human pose. Compared with the traditional 3D human pose estimation algorithm based on single frame in video, our proposed model can make full use of the temporal information in video to improve the recognition performance, and solve the problem of discontinuity in traditional methods. Experimental results show that the proposed model achieves the best performance on the Human3.6M dataset, and its estimation error is much lower than the existing methods that do not use temporal information and the methods that use RNN to model temporal information for 3D human pose estimation. The experimental results also verify the effectiveness and progressiveness of the proposed model.
Keywords: Computer Vision and Application Human Pose Estimation Temporal Convolution Visualization
基金:
引用
No.****
同行评议
勘误表
视频中基于时序空洞卷积的3D人体姿态估计
评论
全部评论