连续复合运动的多模态层次化关键帧提取方法

doi:10.6040/j.issn.1672-3961.0.2022.131

摘要/Abstract

摘要： 针对连续复合运动的关键帧对应的空间范围差异较大且存在重复,难以采用固定的空间特征标准提取的问题,提出一种基于多模态分段与聚类的层次化关键帧提取方法。在完整运动层面按照背景音乐节拍与时空信息等多模态信息将运动序列分割为多个片段;对各片段内部的帧进行空间特征聚类与时序分割,得到若干具有代表性的、姿势可能重复的候选关键帧;根据运动的时空特性消除冗余。以广播体操运动为例提取关键帧并与现有方法进行对比试验与分析,本研究方法能够更加准确、充分地提取运动的关键帧。

关键词: 关键帧提取, 连续复合运动, 层次化, 多模态, 时间序列分割

中图分类号:

TP391

于艺旋,杨耕,耿华. 连续复合运动的多模态层次化关键帧提取方法[J]. 山东大学学报 (工学版), 2023, 53(2): 42-50.

YU Yixuan, YANG Geng, GENG Hua. Multimodal hierarchical keyframe extraction method for continuous combined motion[J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 42-50.

参考文献 33

[1]	TRUONG B T, VENKATESH S. Video abstraction: a systematic review and classification[J]. ACM Transactions on Multimedia Computing, Communications, Applications, 2007, 3(1): 3-11.
[2]	SUN B, KONG D, WANG S, et al. Keyframe extraction for human motion capture data based on affinity propagation[C] //Proceedings of the 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference. Piscataway, USA: IEEE, 2018: 107-112.
[3]	HAN F, REILY B, HOFF W, et al. Space-time representation of people based on 3D skeletal data: a review[J]. Computer Vision Image Understanding, 2017, 158(2): 85-105.
[4]	ZHOU F, DE F, HODGINS J. Hierarchical aligned cluster analysis for temporal clustering of human motion[J]. IEEE Transactions on Pattern Analysis Machine Intelligence, 2012, 35(3): 582-596.
[5]	WANG P, YUAN C, HU W, et al. Graph based skeleton motion representation and similarity measurement for action recognition[C] //Proceedings of the European Conference on Computer Vision. Piscataway, USA: Springer, 2016: 370-385.
[6]	WENG J, WENG C, YUAN J, et al. Discriminative spatio-temporal pattern discovery for 3D action recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 29(4): 1077-1089.
[7]	ZHANG P, LAN C, ZENG W, et al. Semantics-guided neural networks for efficient skeleton-based human action recognition [C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2020: 1112-1121.
[8]	ZHANG Z. Microsoft kinect sensor and its effect[J]. IEEE Multimedia, 2012, 19(2): 4-10.
[9]	姚桐. 视频语义检测关键帧提取算法研究[D]. 西安: 中国科学院西安光学精密机械研究所, 2018. YAO Tong. Research on the key frames extraction algorithm on video semantic detection[D]. Xi'an: Xi'an Institute of Optics and Precision Mechanics of CAS, 2018.
[10]	LIM I, THALMANN D. Key-posture extraction out of human motion data[C] //Proceedings of the 23rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Piscataway, USA: IEEE, 2001: 1167-1169.
[11]	HALIT C, CAPIN T. Multiscale motion saliency for keyframe extraction from motion capture sequences[J]. Computer Animation and Worlds Virtual, 2011, 22(1): 3-14.
[12]	杨涛, 肖俊, 吴飞.基于分层曲线简化的运动捕获数据关键帧提取[J].计算机辅助设计与图形学学报, 2006, 18(11): 1691-1697. YANG Tao, XIAO Jun, WU Fei, et al. Extraction of keyframe of motion capture data based on layered curve simplification[J]. Journal of Computer-Aided Design & Computer Graphics, 2006, 18(11): 1691-1697.
[13]	文雪琴.太极拳视频的配准研究[D].湘潭: 湘潭大学, 2019. WEN Xueqin. Research on the registration of Tai Chi video clips [D]. Xiangtan: Xiangtan University, 2019.
[14]	沈军行, 孙守迁, 潘云鹤.从运动捕获数据中提取关键帧[J].计算机辅助设计与图形学学报, 2004, 16(5): 719-723. SHEN Junxing, SUN Shouqian, PAN Yunhe. Key-frame extraction from motion capture data[J]. Journal of Computer-Aided Design & Computer Graphics, 2004, 16(5): 719-723.
[15]	LIU X, HAO A, ZHAO D. Optimization-based key frame extraction for motion capture animation[J]. The Visual Computer, 2013, 29(1): 85-95.
[16]	XIA G, SUN H, NIU X, et al. Keyframe extraction for human motion capture data based on joint kernel sparse representation[J]. IEEE Transactions on Industrial Electronics, 2016, 64(2): 1589-1599.
[17]	TANG Y, TIAN Y, LU J, et al. Deep progressive reinforcement learning for skeleton-based action recognition[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2018: 5323-5332.
[18]	蔡美玲, 邹北骥, 辛国江. 预选策略和重建误差优化的运动捕获数据关键帧提取[J].计算机辅助设计与图形学学报, 2012, 24(11): 1485-1492. CAI Meiling, ZOU Beiji, XIN Guojiang. Extraction of key-frame motion capture data based on pre-selection and reconstruction error optimization[J]. Journal of Computer-Aided Design & Computer Graphics, 2012, 24(11): 1485-1492.
[19]	MO C, HU K, MEI S, et al. Keyframe extraction from motion capture sequences with graph based deep reinforcement learning[C] //Proceedings of the 29th ACM International Conference on Multimedia. New York, USA: ACM, 2021: 5194-5202.
[20]	COOPER M, FOOTE J. Summarizing video using non-negative similarity matrix factorization[C] //Proceedings of the 2002 IEEE Workshop on Multimedia Signal Processing. Piscataway, USA: IEEE, 2002: 25-28.
[21]	HUANG K, CHANG C, HSU Y, et al. Key probe: a technique for animation keyframe extraction[J]. The Visual Computer, 2005, 21(8): 532-541.
[22]	ZHANG Q, YU S, ZHOU D, et al. An efficient method of key-frame extraction based on a cluster algorithm[J]. Journal of Human Kinetics, 2013, 39(3): 5-14.
[23]	VOULODIMOS A, RALLIS I, DOULAMIS N. Physics-based keyframe selection for human motion summarization[J]. Multimedia Tools and Applications, 2020, 79(5): 3243-3259.
[24]	LIU H, HAO H. Key frame extraction based on improved hierarchical clustering algorithm[C] //Proceedings of the 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery. Piscataway, USA: IEEE, 2014: 793-797.
[25]	KITSIKIDIS A, DIMITROPOULOS K, DOUKA S, et al. Dance analysis using multiple kinect sensors[C] //Proceedings of the 2014 International Conference on Computer Vision Theory and Applications. Piscataway, USA: IEEE, 2014: 789-795.
[26]	季月鹏.基于视频人体姿态估计的高尔夫挥杆动作比对分析研究[D].南京: 南京邮电大学, 2019. JI Yuepeng. Comparative analysis of golf swing based on video human pose estimation[D]. Nanjing: Nanjing University of Posts and Telecommunications, 2019.
[27]	ZHOU Y, HABERMANN M, HABIBIE I, et al. Monocular real-time full body capture with inter-part correlations computer vision and pattern recognition[C] //Proceedings of the 34th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2021: 795-806.
[28]	PHAM H H, SALMANE H, KHOUDOUR L, et al. A unified deep framework for joint 3D pose estimation and action recognition from a single rgb camera[J]. Sensors, 2020, 20(7): 1825-1839.
[29]	LIU J, SHAHROUDY A, PEREZ M, et al. NTU RGB+D 120: a large-scale benchmark for 3D human activity understanding[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 42(10): 2684-2701.
[30]	HU J, ZHENG W, LAI J, et al. Jointly learning heterogeneous features for RGB-D activity recognition[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2015: 5344-5352.
[31]	MÜLLER M, RÖDER T, CLAUSEN M, et al. Documentation mocap database hdm05[J]. Citeseer, 2007, 14(1): 26-40.
[32]	国家体育总局. 第九套广播体操手册[M]. 北京: 人民体育出版社, 2011.
[33]	BÖCK S, KORZENIOWSKI F, SCHLÜTER J, et al. Madmom: a new python audio and music signal processing library[C] //Proceedings of the 24th ACM International Conference on Multimedia. New York, USA: ACM, 2016: 1174-1178.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed