山东大学学报 (工学版) ›› 2023, Vol. 53 ›› Issue (2): 42-50.doi: 10.6040/j.issn.1672-3961.0.2022.131
于艺旋,杨耕*,耿华
YU Yixuan, YANG Geng*, GENG Hua
摘要: 针对连续复合运动的关键帧对应的空间范围差异较大且存在重复,难以采用固定的空间特征标准提取的问题,提出一种基于多模态分段与聚类的层次化关键帧提取方法。在完整运动层面按照背景音乐节拍与时空信息等多模态信息将运动序列分割为多个片段;对各片段内部的帧进行空间特征聚类与时序分割,得到若干具有代表性的、姿势可能重复的候选关键帧;根据运动的时空特性消除冗余。以广播体操运动为例提取关键帧并与现有方法进行对比试验与分析,本研究方法能够更加准确、充分地提取运动的关键帧。
中图分类号:
[1] | TRUONG B T, VENKATESH S. Video abstraction: a systematic review and classification[J]. ACM Transactions on Multimedia Computing, Communications, Applications, 2007, 3(1): 3-11. |
[2] | SUN B, KONG D, WANG S, et al. Keyframe extraction for human motion capture data based on affinity propagation[C] //Proceedings of the 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference. Piscataway, USA: IEEE, 2018: 107-112. |
[3] | HAN F, REILY B, HOFF W, et al. Space-time representation of people based on 3D skeletal data: a review[J]. Computer Vision Image Understanding, 2017, 158(2): 85-105. |
[4] | ZHOU F, DE F, HODGINS J. Hierarchical aligned cluster analysis for temporal clustering of human motion[J]. IEEE Transactions on Pattern Analysis Machine Intelligence, 2012, 35(3): 582-596. |
[5] | WANG P, YUAN C, HU W, et al. Graph based skeleton motion representation and similarity measurement for action recognition[C] //Proceedings of the European Conference on Computer Vision. Piscataway, USA: Springer, 2016: 370-385. |
[6] | WENG J, WENG C, YUAN J, et al. Discriminative spatio-temporal pattern discovery for 3D action recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 29(4): 1077-1089. |
[7] | ZHANG P, LAN C, ZENG W, et al. Semantics-guided neural networks for efficient skeleton-based human action recognition [C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2020: 1112-1121. |
[8] | ZHANG Z. Microsoft kinect sensor and its effect[J]. IEEE Multimedia, 2012, 19(2): 4-10. |
[9] | 姚桐. 视频语义检测关键帧提取算法研究[D]. 西安: 中国科学院西安光学精密机械研究所, 2018. YAO Tong. Research on the key frames extraction algorithm on video semantic detection[D]. Xi'an: Xi'an Institute of Optics and Precision Mechanics of CAS, 2018. |
[10] | LIM I, THALMANN D. Key-posture extraction out of human motion data[C] //Proceedings of the 23rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Piscataway, USA: IEEE, 2001: 1167-1169. |
[11] | HALIT C, CAPIN T. Multiscale motion saliency for keyframe extraction from motion capture sequences[J]. Computer Animation and Worlds Virtual, 2011, 22(1): 3-14. |
[12] | 杨涛, 肖俊, 吴飞.基于分层曲线简化的运动捕获数据关键帧提取[J].计算机辅助设计与图形学学报, 2006, 18(11): 1691-1697. YANG Tao, XIAO Jun, WU Fei, et al. Extraction of keyframe of motion capture data based on layered curve simplification[J]. Journal of Computer-Aided Design & Computer Graphics, 2006, 18(11): 1691-1697. |
[13] | 文雪琴.太极拳视频的配准研究[D].湘潭: 湘潭大学, 2019. WEN Xueqin. Research on the registration of Tai Chi video clips [D]. Xiangtan: Xiangtan University, 2019. |
[14] | 沈军行, 孙守迁, 潘云鹤.从运动捕获数据中提取关键帧[J].计算机辅助设计与图形学学报, 2004, 16(5): 719-723. SHEN Junxing, SUN Shouqian, PAN Yunhe. Key-frame extraction from motion capture data[J]. Journal of Computer-Aided Design & Computer Graphics, 2004, 16(5): 719-723. |
[15] | LIU X, HAO A, ZHAO D. Optimization-based key frame extraction for motion capture animation[J]. The Visual Computer, 2013, 29(1): 85-95. |
[16] | XIA G, SUN H, NIU X, et al. Keyframe extraction for human motion capture data based on joint kernel sparse representation[J]. IEEE Transactions on Industrial Electronics, 2016, 64(2): 1589-1599. |
[17] | TANG Y, TIAN Y, LU J, et al. Deep progressive reinforcement learning for skeleton-based action recognition[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2018: 5323-5332. |
[18] | 蔡美玲, 邹北骥, 辛国江. 预选策略和重建误差优化的运动捕获数据关键帧提取[J].计算机辅助设计与图形学学报, 2012, 24(11): 1485-1492. CAI Meiling, ZOU Beiji, XIN Guojiang. Extraction of key-frame motion capture data based on pre-selection and reconstruction error optimization[J]. Journal of Computer-Aided Design & Computer Graphics, 2012, 24(11): 1485-1492. |
[19] | MO C, HU K, MEI S, et al. Keyframe extraction from motion capture sequences with graph based deep reinforcement learning[C] //Proceedings of the 29th ACM International Conference on Multimedia. New York, USA: ACM, 2021: 5194-5202. |
[20] | COOPER M, FOOTE J. Summarizing video using non-negative similarity matrix factorization[C] //Proceedings of the 2002 IEEE Workshop on Multimedia Signal Processing. Piscataway, USA: IEEE, 2002: 25-28. |
[21] | HUANG K, CHANG C, HSU Y, et al. Key probe: a technique for animation keyframe extraction[J]. The Visual Computer, 2005, 21(8): 532-541. |
[22] | ZHANG Q, YU S, ZHOU D, et al. An efficient method of key-frame extraction based on a cluster algorithm[J]. Journal of Human Kinetics, 2013, 39(3): 5-14. |
[23] | VOULODIMOS A, RALLIS I, DOULAMIS N. Physics-based keyframe selection for human motion summarization[J]. Multimedia Tools and Applications, 2020, 79(5): 3243-3259. |
[24] | LIU H, HAO H. Key frame extraction based on improved hierarchical clustering algorithm[C] //Proceedings of the 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery. Piscataway, USA: IEEE, 2014: 793-797. |
[25] | KITSIKIDIS A, DIMITROPOULOS K, DOUKA S, et al. Dance analysis using multiple kinect sensors[C] //Proceedings of the 2014 International Conference on Computer Vision Theory and Applications. Piscataway, USA: IEEE, 2014: 789-795. |
[26] | 季月鹏.基于视频人体姿态估计的高尔夫挥杆动作比对分析研究[D].南京: 南京邮电大学, 2019. JI Yuepeng. Comparative analysis of golf swing based on video human pose estimation[D]. Nanjing: Nanjing University of Posts and Telecommunications, 2019. |
[27] | ZHOU Y, HABERMANN M, HABIBIE I, et al. Monocular real-time full body capture with inter-part correlations computer vision and pattern recognition[C] //Proceedings of the 34th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2021: 795-806. |
[28] | PHAM H H, SALMANE H, KHOUDOUR L, et al. A unified deep framework for joint 3D pose estimation and action recognition from a single rgb camera[J]. Sensors, 2020, 20(7): 1825-1839. |
[29] | LIU J, SHAHROUDY A, PEREZ M, et al. NTU RGB+D 120: a large-scale benchmark for 3D human activity understanding[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 42(10): 2684-2701. |
[30] | HU J, ZHENG W, LAI J, et al. Jointly learning heterogeneous features for RGB-D activity recognition[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2015: 5344-5352. |
[31] | MÜLLER M, RÖDER T, CLAUSEN M, et al. Documentation mocap database hdm05[J]. Citeseer, 2007, 14(1): 26-40. |
[32] | 国家体育总局. 第九套广播体操手册[M]. 北京: 人民体育出版社, 2011. |
[33] | BÖCK S, KORZENIOWSKI F, SCHLÜTER J, et al. Madmom: a new python audio and music signal processing library[C] //Proceedings of the 24th ACM International Conference on Multimedia. New York, USA: ACM, 2016: 1174-1178. |
[1] | 杨霄,袭肖明,李维翠,杨璐. 基于层次化双重注意力网络的乳腺多模态图像分类[J]. 山东大学学报 (工学版), 2022, 52(3): 34-41. |
[2] | 霍兵强,周涛,陆惠玲,董雅丽,刘珊. 基于NRC和多模态残差神经网络的肺部肿瘤良恶性分类[J]. 山东大学学报 (工学版), 2020, 50(6): 59-67. |
[3] | 田枫,李欣,刘芳,李闯,孙小强,杜睿山. 基于多模态子空间学习的语义标签生成方法[J]. 山东大学学报 (工学版), 2020, 50(3): 31-37, 44. |
[4] | 李秋玲,邵宝民,赵磊,王振,姜雪. 基于ViBe算法运动特征的关键帧提取算法[J]. 山东大学学报 (工学版), 2020, 50(1): 8-13. |
[5] | 常致富,周风余,王玉刚,沈冬冬,赵阳. 基于深度学习的图像自动标注方法综述[J]. 山东大学学报 (工学版), 2019, 49(6): 25-35. |
[6] | 陈成军,周以齐,杨红娟 . 基于SolidWorks模型的虚拟装配模型转换和表达方法[J]. 山东大学学报(工学版), 2008, 38(1): 61-65 . |
[7] | 翟海亭,吴晓娟,彭彰 . 一种改进的基于互信息的三维医学图像配准算法[J]. 山东大学学报(工学版), 2006, 36(4): 33-39 . |
|