您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2023, Vol. 53 ›› Issue (2): 42-50.doi: 10.6040/j.issn.1672-3961.0.2022.131

• • 上一篇    下一篇

连续复合运动的多模态层次化关键帧提取方法

于艺旋,杨耕*,耿华   

  1. 清华大学自动化系, 北京 100084
  • 收稿日期:2022-04-11 出版日期:2023-04-22 发布日期:2023-04-21
  • 作者简介:于艺旋(1997— ),女,辽宁营口人,硕士研究生,主要研究方向为人体运动分析、计算机视觉. E-mail: yuyx19@mails.tsinghua.edu.cn. *通信作者简介:杨耕(1957— ),男,四川青川人,研究员,博士,主要研究方向为电机控制系统、电力电子系统控制技术、可再生能源系统的控制和优化技术、动力电池的老化模型和AI技术在上述领域的应用. E-mail:yanggeng@tsinghua.edu.cn

Multimodal hierarchical keyframe extraction method for continuous combined motion

YU Yixuan, YANG Geng*, GENG Hua   

  1. Department of Automation, Tsinghua University, Beijing 100084, China
  • Received:2022-04-11 Online:2023-04-22 Published:2023-04-21

摘要: 针对连续复合运动的关键帧对应的空间范围差异较大且存在重复,难以采用固定的空间特征标准提取的问题,提出一种基于多模态分段与聚类的层次化关键帧提取方法。在完整运动层面按照背景音乐节拍与时空信息等多模态信息将运动序列分割为多个片段;对各片段内部的帧进行空间特征聚类与时序分割,得到若干具有代表性的、姿势可能重复的候选关键帧;根据运动的时空特性消除冗余。以广播体操运动为例提取关键帧并与现有方法进行对比试验与分析,本研究方法能够更加准确、充分地提取运动的关键帧。

关键词: 关键帧提取, 连续复合运动, 层次化, 多模态, 时间序列分割

中图分类号: 

  • TP391
[1] TRUONG B T, VENKATESH S. Video abstraction: a systematic review and classification[J]. ACM Transactions on Multimedia Computing, Communications, Applications, 2007, 3(1): 3-11.
[2] SUN B, KONG D, WANG S, et al. Keyframe extraction for human motion capture data based on affinity propagation[C] //Proceedings of the 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference. Piscataway, USA: IEEE, 2018: 107-112.
[3] HAN F, REILY B, HOFF W, et al. Space-time representation of people based on 3D skeletal data: a review[J]. Computer Vision Image Understanding, 2017, 158(2): 85-105.
[4] ZHOU F, DE F, HODGINS J. Hierarchical aligned cluster analysis for temporal clustering of human motion[J]. IEEE Transactions on Pattern Analysis Machine Intelligence, 2012, 35(3): 582-596.
[5] WANG P, YUAN C, HU W, et al. Graph based skeleton motion representation and similarity measurement for action recognition[C] //Proceedings of the European Conference on Computer Vision. Piscataway, USA: Springer, 2016: 370-385.
[6] WENG J, WENG C, YUAN J, et al. Discriminative spatio-temporal pattern discovery for 3D action recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 29(4): 1077-1089.
[7] ZHANG P, LAN C, ZENG W, et al. Semantics-guided neural networks for efficient skeleton-based human action recognition [C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2020: 1112-1121.
[8] ZHANG Z. Microsoft kinect sensor and its effect[J]. IEEE Multimedia, 2012, 19(2): 4-10.
[9] 姚桐. 视频语义检测关键帧提取算法研究[D]. 西安: 中国科学院西安光学精密机械研究所, 2018. YAO Tong. Research on the key frames extraction algorithm on video semantic detection[D]. Xi'an: Xi'an Institute of Optics and Precision Mechanics of CAS, 2018.
[10] LIM I, THALMANN D. Key-posture extraction out of human motion data[C] //Proceedings of the 23rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Piscataway, USA: IEEE, 2001: 1167-1169.
[11] HALIT C, CAPIN T. Multiscale motion saliency for keyframe extraction from motion capture sequences[J]. Computer Animation and Worlds Virtual, 2011, 22(1): 3-14.
[12] 杨涛, 肖俊, 吴飞.基于分层曲线简化的运动捕获数据关键帧提取[J].计算机辅助设计与图形学学报, 2006, 18(11): 1691-1697. YANG Tao, XIAO Jun, WU Fei, et al. Extraction of keyframe of motion capture data based on layered curve simplification[J]. Journal of Computer-Aided Design & Computer Graphics, 2006, 18(11): 1691-1697.
[13] 文雪琴.太极拳视频的配准研究[D].湘潭: 湘潭大学, 2019. WEN Xueqin. Research on the registration of Tai Chi video clips [D]. Xiangtan: Xiangtan University, 2019.
[14] 沈军行, 孙守迁, 潘云鹤.从运动捕获数据中提取关键帧[J].计算机辅助设计与图形学学报, 2004, 16(5): 719-723. SHEN Junxing, SUN Shouqian, PAN Yunhe. Key-frame extraction from motion capture data[J]. Journal of Computer-Aided Design & Computer Graphics, 2004, 16(5): 719-723.
[15] LIU X, HAO A, ZHAO D. Optimization-based key frame extraction for motion capture animation[J]. The Visual Computer, 2013, 29(1): 85-95.
[16] XIA G, SUN H, NIU X, et al. Keyframe extraction for human motion capture data based on joint kernel sparse representation[J]. IEEE Transactions on Industrial Electronics, 2016, 64(2): 1589-1599.
[17] TANG Y, TIAN Y, LU J, et al. Deep progressive reinforcement learning for skeleton-based action recognition[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2018: 5323-5332.
[18] 蔡美玲, 邹北骥, 辛国江. 预选策略和重建误差优化的运动捕获数据关键帧提取[J].计算机辅助设计与图形学学报, 2012, 24(11): 1485-1492. CAI Meiling, ZOU Beiji, XIN Guojiang. Extraction of key-frame motion capture data based on pre-selection and reconstruction error optimization[J]. Journal of Computer-Aided Design & Computer Graphics, 2012, 24(11): 1485-1492.
[19] MO C, HU K, MEI S, et al. Keyframe extraction from motion capture sequences with graph based deep reinforcement learning[C] //Proceedings of the 29th ACM International Conference on Multimedia. New York, USA: ACM, 2021: 5194-5202.
[20] COOPER M, FOOTE J. Summarizing video using non-negative similarity matrix factorization[C] //Proceedings of the 2002 IEEE Workshop on Multimedia Signal Processing. Piscataway, USA: IEEE, 2002: 25-28.
[21] HUANG K, CHANG C, HSU Y, et al. Key probe: a technique for animation keyframe extraction[J]. The Visual Computer, 2005, 21(8): 532-541.
[22] ZHANG Q, YU S, ZHOU D, et al. An efficient method of key-frame extraction based on a cluster algorithm[J]. Journal of Human Kinetics, 2013, 39(3): 5-14.
[23] VOULODIMOS A, RALLIS I, DOULAMIS N. Physics-based keyframe selection for human motion summarization[J]. Multimedia Tools and Applications, 2020, 79(5): 3243-3259.
[24] LIU H, HAO H. Key frame extraction based on improved hierarchical clustering algorithm[C] //Proceedings of the 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery. Piscataway, USA: IEEE, 2014: 793-797.
[25] KITSIKIDIS A, DIMITROPOULOS K, DOUKA S, et al. Dance analysis using multiple kinect sensors[C] //Proceedings of the 2014 International Conference on Computer Vision Theory and Applications. Piscataway, USA: IEEE, 2014: 789-795.
[26] 季月鹏.基于视频人体姿态估计的高尔夫挥杆动作比对分析研究[D].南京: 南京邮电大学, 2019. JI Yuepeng. Comparative analysis of golf swing based on video human pose estimation[D]. Nanjing: Nanjing University of Posts and Telecommunications, 2019.
[27] ZHOU Y, HABERMANN M, HABIBIE I, et al. Monocular real-time full body capture with inter-part correlations computer vision and pattern recognition[C] //Proceedings of the 34th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2021: 795-806.
[28] PHAM H H, SALMANE H, KHOUDOUR L, et al. A unified deep framework for joint 3D pose estimation and action recognition from a single rgb camera[J]. Sensors, 2020, 20(7): 1825-1839.
[29] LIU J, SHAHROUDY A, PEREZ M, et al. NTU RGB+D 120: a large-scale benchmark for 3D human activity understanding[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 42(10): 2684-2701.
[30] HU J, ZHENG W, LAI J, et al. Jointly learning heterogeneous features for RGB-D activity recognition[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2015: 5344-5352.
[31] MÜLLER M, RÖDER T, CLAUSEN M, et al. Documentation mocap database hdm05[J]. Citeseer, 2007, 14(1): 26-40.
[32] 国家体育总局. 第九套广播体操手册[M]. 北京: 人民体育出版社, 2011.
[33] BÖCK S, KORZENIOWSKI F, SCHLÜTER J, et al. Madmom: a new python audio and music signal processing library[C] //Proceedings of the 24th ACM International Conference on Multimedia. New York, USA: ACM, 2016: 1174-1178.
[1] 刁振宇,韩小凡,张承宇,聂慧佳,赵秀阳,牛冬梅. 基于实例判别与特征增强的单图三维模型检索[J]. 山东大学学报 (工学版), 2025, 55(2): 71-77.
[2] 李伟豪,王苹苹,许万博,魏本征. 结构先验引导的多模态腰椎MRI图像分割算法[J]. 山东大学学报 (工学版), 2025, 55(1): 66-76.
[3] 谭智方,董飞,卢鹏宇,潘嘉男,聂秀山,尹义龙. 基于跨模态注意力哈希学习的视频片段定位方法[J]. 山东大学学报 (工学版), 2025, 55(1): 58-65.
[4] 聂秀山,巩蕊,董飞,郭杰,马玉玲. 短视频场景分类方法综述[J]. 山东大学学报 (工学版), 2024, 54(3): 1-11.
[5] 杨霄,袭肖明,李维翠,杨璐. 基于层次化双重注意力网络的乳腺多模态图像分类[J]. 山东大学学报 (工学版), 2022, 52(3): 34-41.
[6] 霍兵强,周涛,陆惠玲,董雅丽,刘珊. 基于NRC和多模态残差神经网络的肺部肿瘤良恶性分类[J]. 山东大学学报 (工学版), 2020, 50(6): 59-67.
[7] 田枫, 李欣, 刘芳, 李闯, 孙小强, 杜睿山. 基于多模态子空间学习的语义标签生成方法[J]. 山东大学学报 (工学版), 2020, 50(3): 31-37.
[8] 李秋玲,邵宝民,赵磊,王振,姜雪. 基于ViBe算法运动特征的关键帧提取算法[J]. 山东大学学报 (工学版), 2020, 50(1): 8-13.
[9] 常致富,周风余,王玉刚,沈冬冬,赵阳. 基于深度学习的图像自动标注方法综述[J]. 山东大学学报 (工学版), 2019, 49(6): 25-35.
[10] 陈成军,周以齐,杨红娟 . 基于SolidWorks模型的虚拟装配模型转换和表达方法[J]. 山东大学学报(工学版), 2008, 38(1): 61-65 .
[11] 翟海亭,吴晓娟,彭彰 . 一种改进的基于互信息的三维医学图像配准算法[J]. 山东大学学报(工学版), 2006, 36(4): 33-39 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 张永花,王安玲,刘福平 . 低频非均匀电磁波在导电界面的反射相角[J]. 山东大学学报(工学版), 2006, 36(2): 22 -25 .
[2] 李 侃 . 嵌入式相贯线焊接控制系统开发与实现[J]. 山东大学学报(工学版), 2008, 38(4): 37 -41 .
[3] 孔祥臻,刘延俊,王勇,赵秀华 . 气动比例阀的死区补偿与仿真[J]. 山东大学学报(工学版), 2006, 36(1): 99 -102 .
[4] 来翔 . 用胞映射方法讨论一类MKdV方程[J]. 山东大学学报(工学版), 2006, 36(1): 87 -92 .
[5] 余嘉元1 , 田金亭1 , 朱强忠2 . 计算智能在心理学中的应用[J]. 山东大学学报(工学版), 2009, 39(1): 1 -5 .
[6] 陈瑞,李红伟,田靖. 磁极数对径向磁轴承承载力的影响[J]. 山东大学学报(工学版), 2018, 48(2): 81 -85 .
[7] 王波,王宁生 . 机电装配体拆卸序列的自动生成及组合优化[J]. 山东大学学报(工学版), 2006, 36(2): 52 -57 .
[8] 季涛,高旭,孙同景,薛永端,徐丙垠 . 铁路10 kV自闭/贯通线路故障行波特征分析[J]. 山东大学学报(工学版), 2006, 36(2): 111 -116 .
[9] 秦通,孙丰荣*,王丽梅,王庆浩,李新彩. 基于极大圆盘引导的形状插值实现三维表面重建[J]. 山东大学学报(工学版), 2010, 40(3): 1 -5 .
[10] 张英,郎咏梅,赵玉晓,张鉴达,乔鹏,李善评 . 由EGSB厌氧颗粒污泥培养好氧颗粒污泥的工艺探讨[J]. 山东大学学报(工学版), 2006, 36(4): 56 -59 .