基于层叠的部件轨迹片段模型的视频人体姿态估计

doi:10.6040/j.issn.1672-3961.0.2017.431

Abstract

Abstract: To address the problem of full body human pose estimation in video, a coarse-to-fine cascade of spatio-temporal models was developed in which the tracklet of body part was considered as basic unit. The notion of “tracklet” ranges from trajectory covering the whole video to body part in one frame. In this cascade, coarse models filtered the state space for the next level via their max-marginals. Loops in the graphical models made the inference intractable, the models were decomposed into Markov random fields and hidden Markov models. Through iterative spatial and temporal parsing, optimal solution was achieved in polynomial time. To generate reliable state hypotheses, the pose detections were propagated to whole video sequence through global motion cues. Our model was applied on three publicly available datasets and showed remarkable quantitative and qualitative improvements over the state-of-the-art approaches.

Key words: Markov random field, tracklet, pose estimation, hidden Markov model

CLC Number:

TP391

SHI Qingxuan, WANG Qian, TIAN Xuedong. Cascaded tracklet-based spatio-temporal model for video pose estimation[J].JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(2): 14-21.

References

[1] 李毅, 孙正兴, 陈松乐,等. 基于退火粒子群优化的单目视频人体姿态分析方法[J]. 自动化学报, 2012,38(5): 732-741. LI Yi, SUN Zhengxing, CHEN Songle, et al. 3D human pose analysis from monocular video by simulated annealed particle swarm optimization[J]. Acta Automatica Sinica, 2012, 38(5): 732-741.
[2] 朱煜, 赵江坤, 王逸宁, 等. 基于深度学习的人体行为识别算法综述[J]. 自动化学报, 2016,42(6): 848-857. ZHU Yu, ZHAO Jiangkun, WANG Yining, et al. A review of human action recognition based on deep learning[J]. Acta Automatica Sinica, 2016, 42(6): 848-857.
[3] SHOTTON J, GIRSHICK R, FITZGIBBON A, et al. Efficient human pose estimation from single depth images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12): 2821-2840.
[4] CRISTANI M, RAGHAVENDRA R, DEL BUE A, et al. Human behavior analysis in video surveillance: A social signal processing perspective[J]. Neurocomputing, 2013,100: 86-97.
[5] WANG L M, QIAO Y, TANG X O. Video action detection with relational dynamic-poselets[C] //Proceedings of European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014: 565-580.
[6] FELZENSZWALB P F, HUTTENLOCHER D P. Pictorial structures for object recognition[J]. International Journal of Computer Vision, 2005, 61(1): 55-79.
[7] YANG Y, RAMANAN D. Articulated human detection with flexible mixtures of parts[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013,35(12): 2878-2890.
[8] SAPP B, JORDAN C, TASKAR B. Adaptive pose priors for pictorial structures[C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE, 2010: 422-429.
[9] ANDRILUKA M, ROTH S, SCHIELE B. Pictorial structures revisited: People detection and articulated pose estimation[C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE, 2009: 1014-1021.
[10] EICHNER M, MARIN-JIMENEZ M, ZISSERMAN A, et al. 2d articulated human pose estimation and retrieval in(almost)unconstrained still images[J]. International Journal of Computer Vision, 2012, 99(2): 190-214.
[11] FERRARI V, MARIN-JIMENEZ M, ZISSERMAN A. Progressive search space reduction for human pose estimation[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, USA: IEEE, 2008: 1-8.
[12] SHI Q X, DI H J, LU Y, et al. Human pose estimation with global motion cues[C] //Proceedings of the IEEE International Conference on Image Processing. Quebec, Canada: IEEE, 2015: 442-446.
[13] SAPP B, WEISS D, TASKAR B. Parsing human motion with stretchable models[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA: IEEE, 2011: 1281-1288.
[14] ZHAO L, GAO X B, TAO D C, et al. Tracking human pose using max-margin markov models[J]. IEEE Transactions on Image Processing, 2015, 24(12): 5274-5287.
[15] RAMAKRISHNA V, KANADE T, SHEIKH Y. Tracking human pose by tracking symmetric parts[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE, 2013: 3728-3735.
[16] CHERIAN A, MAIRAL J, ALAHARI K, et al. Mixing body-part sequences for human pose estimation[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE, 2014: 2361-2368.
[17] SIGAL L, BHATIA S, ROTH S, et al. Tracking loose-limbed people[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2004: 421-428.
[18] SMINCHISESCU C, TRIGGS B. Estimating articulated human motion with covariance scaled sampling[J]. The International Journal of Robotics Research, 2003,22(6): 371-391.
[19] WEISS D, SAPP B, TASKAR B. Sidestepping intractable inference with structured ensemble cascades[C] //Proceedings of Advances in Neural Information Processing Systems. Vancouver, Canada: MIT Press, 2010: 2415-2423.
[20] TOKOLA R, CHOI W, SAVARESE S. Breaking the chain: liberation from the temporal Markov assumption for tracking human poses[C] //Proceedings of the IEEE International Conference on Computer Vision. Sydney, Australia: IEEE, 2013: 2424-2431.
[21] ZHANG D, SHAH M. Human pose estimation in videos[C] //Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 2012-2020.
[22] SHI Q, DI H, LU Y, et al. Video pose estimation via medium granularity graphical model with spatial-temporal symmetric constraint part model[C] //Proceedings of IEEE International Conference on Image Processing. Phoenix, USA: IEEE, 2016:1299-1303.
[23] SAPP B, TOSHEV A, TASKAR B. Cascaded models for articulated pose estimation[C] //Proceedings of European conference on computer vision. Hersonissos, Greece: Springer Berlin Heidelberg, 2010: 406-420.
[24] TRAN D, WANG Y, FORSYTH D. Human parsing with a cascade of hierarchical poselet based pruners[C] //Proceedings of Multimedia and Expo(ICME), 2014 IEEE International Conference on. Chengdu, China: IEEE, 2014: 1-6.
[25] GKIOXARI G, HARIHARAN B, GIRSHICH R, et al. Using k-poselets for detecting people and localizing their keypoints[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE, 2014: 3582-3589.
[26] 吕峰, 邸慧军, 陆耀, 等. 基于分层弹性运动分析的非刚体跟踪方法[J]. 自动化学报, 2015,41(2): 295-303. LYU Feng, DI Huijun, LU Yao, et al. Non-rigid tracking method based on layered elastic motion analysis[J]. Acta Automatica Sinica, 2015, 41(2): 295-303.
[27] DI H J, TAO L M, XU G Y. A mixture of transformed hidden Markov models for elastic motion estimation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(10): 1817-1830.
[28] PARK D, RAMANAN D. N-best maximal decoders for part models[C] //Proceedings of the IEEE International Conference on Computer Vision. Barcelona, Spain: IEEE, 2011: 2627-2634.
[29] SHEN H Q, YU S I, YANG Y, et al. Unsupervised video adaptation for parsing human motion[C] //Proceedings of European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014: 347-360.
[30] WANG C Y, WANG Y Z, YUILLE AL. An approach to pose-based action recognition[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE, 2013: 915-922.
[31] SAPP B, WEISS D, TASKAR B. Parsing human motion with stretchable models[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA: IEEE, 2011: 1281-1288.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 10

[1]	WANG Su-yu,<\sup>,AI Xing<\sup>,ZHAO Jun<\sup>,LI Zuo-li<\sup>,LIU Zeng-wen<\sup> . Milling force prediction model for highspeed end milling 3Cr2Mo steel[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(1): 1 -5 .
[2]	ZHANG Yong-hua,WANG An-ling,LIU Fu-ping . The reflected phase angle of low frequent inhomogeneous[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 22 -25 .
[3]	LI Kan . Empolder and implement of the embedded weld control system[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(4): 37 -41 .
[4]	SHI Lai-shun,WAN Zhong-yi . Synthesis and performance evaluation of a novel betaine-type asphalt emulsifier[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(4): 112 -115 .
[5]	KONG Xiang-zhen,LIU Yan-jun,WANG Yong,ZHAO Xiu-hua . Compensation and simulation for the deadband of the pneumatic proportional valve[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(1): 99 -102 .
[6]	LAI Xiang . The global domain of attraction for a kind of MKdV equations[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(1): 87 -92 .
[7]	YU Jia yuan¹, TIAN Jin ting¹, ZHU Qiang zhong². Computational intelligence and its application in psychology[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 1 -5 .
[8]	CHEN Rui, LI Hongwei, TIAN Jing. The relationship between the number of magnetic poles and the bearing capacity of radial magnetic bearing[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(2): 81 -85 .
[9]	LI Ke,LIU Chang-chun,LI Tong-lei . Medical registration approach using improved maximization of mutual information[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 107 -110 .
[10]	JI Tao,GAO Xu/sup>,SUN Tong-jing,XUE Yong-duan/sup>,XU Bing-yin/sup> . Characteristic analysis of fault generated traveling waves in 10 Kv automatic blocking and continuous power transmission lines[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 111 -116 .

Cascaded tracklet-based spatio-temporal model for video pose estimation

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 7

Metrics

Comments

Recommended 10

[1]	Xinyu DONG,Hanyue CHEN,Jiaguo LI,Qingyan MENG,Shihe XING,Liming ZHANG. An unsupervised color image segmentation method based on fusion of multiple methods [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 96-101.
[2]	Chenmou WU,Zhijun FANG,Jenqneng HWANG. Active driving behavior analysis algorithm based on monocular camera [J]. Journal of Shandong University(Engineering Science), 2018, 48(5): 69-76.
[3]	LI Lu, FAN Wentao, DU Jixiang. Brain MR image segmentation based on student's t mixture model with Markov random field [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(3): 49-55.
[4]	MENG Lingheng, DING Shifei. Depth perceptual model based on the single image [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(3): 37-43.
[5]	REN Yongfeng, ZHOU Jingbo. An image saliency object detection algorithm based on information diffusion [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2015, 45(6): 1-6.
[6]	WEI Wei, ZHANG Yanning. Pose estimation based on semi-supervised latent Dirichlet allocation [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(3): 17-22.
[7]	SHENG Wei-hua, ZHU Chun. A wearable computing approach for hand gesture and daily activity recognition in human-robot interaction [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(3): 37-50.