JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE) ›› 2018, Vol. 48 ›› Issue (2): 14-21.doi: 10.6040/j.issn.1672-3961.0.2017.431

Previous Articles     Next Articles

Cascaded tracklet-based spatio-temporal model for video pose estimation

SHI Qingxuan1, WANG Qian2, TIAN Xuedong1   

  1. 1. School of Cyber Security and Computer, Hebei University, Baoding 071000, Hebei, China;
    2. Academic Administration, Hebei University, Baoding 071000, Hebei, China
  • Received:2017-08-29 Online:2018-04-20 Published:2017-08-29

Abstract: To address the problem of full body human pose estimation in video, a coarse-to-fine cascade of spatio-temporal models was developed in which the tracklet of body part was considered as basic unit. The notion of “tracklet” ranges from trajectory covering the whole video to body part in one frame. In this cascade, coarse models filtered the state space for the next level via their max-marginals. Loops in the graphical models made the inference intractable, the models were decomposed into Markov random fields and hidden Markov models. Through iterative spatial and temporal parsing, optimal solution was achieved in polynomial time. To generate reliable state hypotheses, the pose detections were propagated to whole video sequence through global motion cues. Our model was applied on three publicly available datasets and showed remarkable quantitative and qualitative improvements over the state-of-the-art approaches.

Key words: Markov random field, tracklet, pose estimation, hidden Markov model

CLC Number: 

  • TP391
[1] 李毅, 孙正兴, 陈松乐,等. 基于退火粒子群优化的单目视频人体姿态分析方法[J]. 自动化学报, 2012,38(5): 732-741. LI Yi, SUN Zhengxing, CHEN Songle, et al. 3D human pose analysis from monocular video by simulated annealed particle swarm optimization[J]. Acta Automatica Sinica, 2012, 38(5): 732-741.
[2] 朱煜, 赵江坤, 王逸宁, 等. 基于深度学习的人体行为识别算法综述[J]. 自动化学报, 2016,42(6): 848-857. ZHU Yu, ZHAO Jiangkun, WANG Yining, et al. A review of human action recognition based on deep learning[J]. Acta Automatica Sinica, 2016, 42(6): 848-857.
[3] SHOTTON J, GIRSHICK R, FITZGIBBON A, et al. Efficient human pose estimation from single depth images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12): 2821-2840.
[4] CRISTANI M, RAGHAVENDRA R, DEL BUE A, et al. Human behavior analysis in video surveillance: A social signal processing perspective[J]. Neurocomputing, 2013,100: 86-97.
[5] WANG L M, QIAO Y, TANG X O. Video action detection with relational dynamic-poselets[C] //Proceedings of European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014: 565-580.
[6] FELZENSZWALB P F, HUTTENLOCHER D P. Pictorial structures for object recognition[J]. International Journal of Computer Vision, 2005, 61(1): 55-79.
[7] YANG Y, RAMANAN D. Articulated human detection with flexible mixtures of parts[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013,35(12): 2878-2890.
[8] SAPP B, JORDAN C, TASKAR B. Adaptive pose priors for pictorial structures[C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE, 2010: 422-429.
[9] ANDRILUKA M, ROTH S, SCHIELE B. Pictorial structures revisited: People detection and articulated pose estimation[C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE, 2009: 1014-1021.
[10] EICHNER M, MARIN-JIMENEZ M, ZISSERMAN A, et al. 2d articulated human pose estimation and retrieval in(almost)unconstrained still images[J]. International Journal of Computer Vision, 2012, 99(2): 190-214.
[11] FERRARI V, MARIN-JIMENEZ M, ZISSERMAN A. Progressive search space reduction for human pose estimation[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, USA: IEEE, 2008: 1-8.
[12] SHI Q X, DI H J, LU Y, et al. Human pose estimation with global motion cues[C] //Proceedings of the IEEE International Conference on Image Processing. Quebec, Canada: IEEE, 2015: 442-446.
[13] SAPP B, WEISS D, TASKAR B. Parsing human motion with stretchable models[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA: IEEE, 2011: 1281-1288.
[14] ZHAO L, GAO X B, TAO D C, et al. Tracking human pose using max-margin markov models[J]. IEEE Transactions on Image Processing, 2015, 24(12): 5274-5287.
[15] RAMAKRISHNA V, KANADE T, SHEIKH Y. Tracking human pose by tracking symmetric parts[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE, 2013: 3728-3735.
[16] CHERIAN A, MAIRAL J, ALAHARI K, et al. Mixing body-part sequences for human pose estimation[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE, 2014: 2361-2368.
[17] SIGAL L, BHATIA S, ROTH S, et al. Tracking loose-limbed people[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2004: 421-428.
[18] SMINCHISESCU C, TRIGGS B. Estimating articulated human motion with covariance scaled sampling[J]. The International Journal of Robotics Research, 2003,22(6): 371-391.
[19] WEISS D, SAPP B, TASKAR B. Sidestepping intractable inference with structured ensemble cascades[C] //Proceedings of Advances in Neural Information Processing Systems. Vancouver, Canada: MIT Press, 2010: 2415-2423.
[20] TOKOLA R, CHOI W, SAVARESE S. Breaking the chain: liberation from the temporal Markov assumption for tracking human poses[C] //Proceedings of the IEEE International Conference on Computer Vision. Sydney, Australia: IEEE, 2013: 2424-2431.
[21] ZHANG D, SHAH M. Human pose estimation in videos[C] //Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 2012-2020.
[22] SHI Q, DI H, LU Y, et al. Video pose estimation via medium granularity graphical model with spatial-temporal symmetric constraint part model[C] //Proceedings of IEEE International Conference on Image Processing. Phoenix, USA: IEEE, 2016:1299-1303.
[23] SAPP B, TOSHEV A, TASKAR B. Cascaded models for articulated pose estimation[C] //Proceedings of European conference on computer vision. Hersonissos, Greece: Springer Berlin Heidelberg, 2010: 406-420.
[24] TRAN D, WANG Y, FORSYTH D. Human parsing with a cascade of hierarchical poselet based pruners[C] //Proceedings of Multimedia and Expo(ICME), 2014 IEEE International Conference on. Chengdu, China: IEEE, 2014: 1-6.
[25] GKIOXARI G, HARIHARAN B, GIRSHICH R, et al. Using k-poselets for detecting people and localizing their keypoints[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE, 2014: 3582-3589.
[26] 吕峰, 邸慧军, 陆耀, 等. 基于分层弹性运动分析的非刚体跟踪方法[J]. 自动化学报, 2015,41(2): 295-303. LYU Feng, DI Huijun, LU Yao, et al. Non-rigid tracking method based on layered elastic motion analysis[J]. Acta Automatica Sinica, 2015, 41(2): 295-303.
[27] DI H J, TAO L M, XU G Y. A mixture of transformed hidden Markov models for elastic motion estimation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(10): 1817-1830.
[28] PARK D, RAMANAN D. N-best maximal decoders for part models[C] //Proceedings of the IEEE International Conference on Computer Vision. Barcelona, Spain: IEEE, 2011: 2627-2634.
[29] SHEN H Q, YU S I, YANG Y, et al. Unsupervised video adaptation for parsing human motion[C] //Proceedings of European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014: 347-360.
[30] WANG C Y, WANG Y Z, YUILLE AL. An approach to pose-based action recognition[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE, 2013: 915-922.
[31] SAPP B, WEISS D, TASKAR B. Parsing human motion with stretchable models[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA: IEEE, 2011: 1281-1288.
[1] WU Chenmou, FANG Zhijun, HWANG Jenqneng. Active driving behavior analysis algorithm based on monocular camera [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(5): 69-76.
[2] LI Lu, FAN Wentao, DU Jixiang. Brain MR image segmentation based on student's t mixture model with Markov random field [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(3): 49-55.
[3] MENG Lingheng, DING Shifei. Depth perceptual model based on the single image [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(3): 37-43.
[4] REN Yongfeng, ZHOU Jingbo. An image saliency object detection algorithm based on information diffusion [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2015, 45(6): 1-6.
[5] WEI Wei, ZHANG Yanning. Pose estimation based on semi-supervised latent Dirichlet allocation [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(3): 17-22.
[6] SHENG Wei-hua, ZHU Chun. A wearable computing approach for hand gesture and daily activity recognition in human-robot interaction [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(3): 37-50.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!