您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报(工学版) ›› 2018, Vol. 48 ›› Issue (2): 14-21.doi: 10.6040/j.issn.1672-3961.0.2017.431

• • 上一篇    下一篇

基于层叠的部件轨迹片段模型的视频人体姿态估计

史青宣1,王谦2,田学东1   

  1. 1. 河北大学网络空间安全与计算机学院, 河北 保定 071000;2. 河北大学教务处, 河北 保定 071000
  • 收稿日期:2017-08-29 出版日期:2018-04-20 发布日期:2017-08-29
  • 作者简介:史青宣(1979— ),女,河北保定人,讲师,博士,主要研究方向为计算机视觉. E-mail:shiqingxuan@hbu.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(编号:61375075);河北省自然科学基金资助项目(编号:F2012201020);河北省高等学校科学技术研究重点资助项目(编号:ZD2017208);河北省教育厅资助项目(编号:ZD2017209)

Cascaded tracklet-based spatio-temporal model for video pose estimation

SHI Qingxuan1, WANG Qian2, TIAN Xuedong1   

  1. 1. School of Cyber Security and Computer, Hebei University, Baoding 071000, Hebei, China;
    2. Academic Administration, Hebei University, Baoding 071000, Hebei, China
  • Received:2017-08-29 Online:2018-04-20 Published:2017-08-29

摘要: 为解决单目视频中的人体姿态估计问题,从人体的部件模型出发,以人体部件轨迹片段为实体构建时空概率图模型,通过逐步缩减轨迹片段在时域上的覆盖度,形成多级层叠模型,采用迭代的时域和空域交替解析的策略,从完整轨迹的推理开始,逐级过滤状态空间,直至获取人体各部件在每帧图像中的最优状态。为提供高质量的状态候选,引入全局运动信息,将单帧图像中人体姿态检测结果传播到整个视频形成轨迹,构成原始状态空间。在3个数据集上的对比试验表明,该方法较其他视频人体姿态估计方法达到了更高的估计精度。

关键词: 轨迹片段, 姿态估计, 马尔科夫随机场, 隐马尔科夫模型

Abstract: To address the problem of full body human pose estimation in video, a coarse-to-fine cascade of spatio-temporal models was developed in which the tracklet of body part was considered as basic unit. The notion of “tracklet” ranges from trajectory covering the whole video to body part in one frame. In this cascade, coarse models filtered the state space for the next level via their max-marginals. Loops in the graphical models made the inference intractable, the models were decomposed into Markov random fields and hidden Markov models. Through iterative spatial and temporal parsing, optimal solution was achieved in polynomial time. To generate reliable state hypotheses, the pose detections were propagated to whole video sequence through global motion cues. Our model was applied on three publicly available datasets and showed remarkable quantitative and qualitative improvements over the state-of-the-art approaches.

Key words: Markov random field, tracklet, pose estimation, hidden Markov model

中图分类号: 

  • TP391
[1] 李毅, 孙正兴, 陈松乐,等. 基于退火粒子群优化的单目视频人体姿态分析方法[J]. 自动化学报, 2012,38(5): 732-741. LI Yi, SUN Zhengxing, CHEN Songle, et al. 3D human pose analysis from monocular video by simulated annealed particle swarm optimization[J]. Acta Automatica Sinica, 2012, 38(5): 732-741.
[2] 朱煜, 赵江坤, 王逸宁, 等. 基于深度学习的人体行为识别算法综述[J]. 自动化学报, 2016,42(6): 848-857. ZHU Yu, ZHAO Jiangkun, WANG Yining, et al. A review of human action recognition based on deep learning[J]. Acta Automatica Sinica, 2016, 42(6): 848-857.
[3] SHOTTON J, GIRSHICK R, FITZGIBBON A, et al. Efficient human pose estimation from single depth images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12): 2821-2840.
[4] CRISTANI M, RAGHAVENDRA R, DEL BUE A, et al. Human behavior analysis in video surveillance: A social signal processing perspective[J]. Neurocomputing, 2013,100: 86-97.
[5] WANG L M, QIAO Y, TANG X O. Video action detection with relational dynamic-poselets[C] //Proceedings of European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014: 565-580.
[6] FELZENSZWALB P F, HUTTENLOCHER D P. Pictorial structures for object recognition[J]. International Journal of Computer Vision, 2005, 61(1): 55-79.
[7] YANG Y, RAMANAN D. Articulated human detection with flexible mixtures of parts[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013,35(12): 2878-2890.
[8] SAPP B, JORDAN C, TASKAR B. Adaptive pose priors for pictorial structures[C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE, 2010: 422-429.
[9] ANDRILUKA M, ROTH S, SCHIELE B. Pictorial structures revisited: People detection and articulated pose estimation[C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE, 2009: 1014-1021.
[10] EICHNER M, MARIN-JIMENEZ M, ZISSERMAN A, et al. 2d articulated human pose estimation and retrieval in(almost)unconstrained still images[J]. International Journal of Computer Vision, 2012, 99(2): 190-214.
[11] FERRARI V, MARIN-JIMENEZ M, ZISSERMAN A. Progressive search space reduction for human pose estimation[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, USA: IEEE, 2008: 1-8.
[12] SHI Q X, DI H J, LU Y, et al. Human pose estimation with global motion cues[C] //Proceedings of the IEEE International Conference on Image Processing. Quebec, Canada: IEEE, 2015: 442-446.
[13] SAPP B, WEISS D, TASKAR B. Parsing human motion with stretchable models[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA: IEEE, 2011: 1281-1288.
[14] ZHAO L, GAO X B, TAO D C, et al. Tracking human pose using max-margin markov models[J]. IEEE Transactions on Image Processing, 2015, 24(12): 5274-5287.
[15] RAMAKRISHNA V, KANADE T, SHEIKH Y. Tracking human pose by tracking symmetric parts[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE, 2013: 3728-3735.
[16] CHERIAN A, MAIRAL J, ALAHARI K, et al. Mixing body-part sequences for human pose estimation[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE, 2014: 2361-2368.
[17] SIGAL L, BHATIA S, ROTH S, et al. Tracking loose-limbed people[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2004: 421-428.
[18] SMINCHISESCU C, TRIGGS B. Estimating articulated human motion with covariance scaled sampling[J]. The International Journal of Robotics Research, 2003,22(6): 371-391.
[19] WEISS D, SAPP B, TASKAR B. Sidestepping intractable inference with structured ensemble cascades[C] //Proceedings of Advances in Neural Information Processing Systems. Vancouver, Canada: MIT Press, 2010: 2415-2423.
[20] TOKOLA R, CHOI W, SAVARESE S. Breaking the chain: liberation from the temporal Markov assumption for tracking human poses[C] //Proceedings of the IEEE International Conference on Computer Vision. Sydney, Australia: IEEE, 2013: 2424-2431.
[21] ZHANG D, SHAH M. Human pose estimation in videos[C] //Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 2012-2020.
[22] SHI Q, DI H, LU Y, et al. Video pose estimation via medium granularity graphical model with spatial-temporal symmetric constraint part model[C] //Proceedings of IEEE International Conference on Image Processing. Phoenix, USA: IEEE, 2016:1299-1303.
[23] SAPP B, TOSHEV A, TASKAR B. Cascaded models for articulated pose estimation[C] //Proceedings of European conference on computer vision. Hersonissos, Greece: Springer Berlin Heidelberg, 2010: 406-420.
[24] TRAN D, WANG Y, FORSYTH D. Human parsing with a cascade of hierarchical poselet based pruners[C] //Proceedings of Multimedia and Expo(ICME), 2014 IEEE International Conference on. Chengdu, China: IEEE, 2014: 1-6.
[25] GKIOXARI G, HARIHARAN B, GIRSHICH R, et al. Using k-poselets for detecting people and localizing their keypoints[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE, 2014: 3582-3589.
[26] 吕峰, 邸慧军, 陆耀, 等. 基于分层弹性运动分析的非刚体跟踪方法[J]. 自动化学报, 2015,41(2): 295-303. LYU Feng, DI Huijun, LU Yao, et al. Non-rigid tracking method based on layered elastic motion analysis[J]. Acta Automatica Sinica, 2015, 41(2): 295-303.
[27] DI H J, TAO L M, XU G Y. A mixture of transformed hidden Markov models for elastic motion estimation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(10): 1817-1830.
[28] PARK D, RAMANAN D. N-best maximal decoders for part models[C] //Proceedings of the IEEE International Conference on Computer Vision. Barcelona, Spain: IEEE, 2011: 2627-2634.
[29] SHEN H Q, YU S I, YANG Y, et al. Unsupervised video adaptation for parsing human motion[C] //Proceedings of European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014: 347-360.
[30] WANG C Y, WANG Y Z, YUILLE AL. An approach to pose-based action recognition[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE, 2013: 915-922.
[31] SAPP B, WEISS D, TASKAR B. Parsing human motion with stretchable models[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA: IEEE, 2011: 1281-1288.
[1] 吴晨谋,方志军,黄正能. 基于单目摄像头的主动式驾驶行为分析算法[J]. 山东大学学报(工学版), 2018, 48(5): 69-76.
[2] 孟令恒,丁世飞. 基于单静态图像的深度感知模型[J]. 山东大学学报(工学版), 2016, 46(3): 37-43.
[3] 任永峰, 周静波. 基于信息弥散机制的图像显著性区域提取算法[J]. 山东大学学报(工学版), 2015, 45(6): 1-6.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!