Journal of Shandong University(Engineering Science) ›› 2024, Vol. 54 ›› Issue (3): 1-11.doi: 10.6040/j.issn.1672-3961.0.2023.109
• Machine Learning & Data Mining •
NIE Xiushan1, GONG Rui1, DONG Fei2, GUO Jie1*, MA Yuling1
CLC Number:
| [1] OLIVA A, TORRALBA A. Modeling the shape of the scene: a holistic representation of the spatial envelope[J]. International Journal of Computer Vision, 2001, 42(3): 145-175. [2] SUDDERTH E B, TORRALBA A, FREEMAN W T, et al. Learning hierarchical models of scenes, objects, and parts[C] //Tenth IEEE International Conference on Computer Vision(ICCV'05): Volume 1. Piscataway, USA: IEEE, 2005: 1331-1338. [3] ZUO Zhen, WANG Gang, SHUAI Bing, et al. Exemplar based deep discriminative and shareable feature learning for scene image classification[J]. Pattern Recognition, 2015, 48(10): 3004-3015. [4] SINGH V, GIRISH D, RALESCU A L. Image understanding-a brief review of scene classification and recognition[J]. MAICS, 2017: 85-91. [5] XIAO J, HAYS J, EHINGER K A, et al. SUN database: large-scale scene recognition from abbey to zoo[C] // Computer Vision & Pattern Recognition. Piscataway, USA: IEEE, 2010. [6] OLIVA A, TORRALBA A. Modeling the shape of the scene: a holistic representation of the spatial envelope[J]. International Journal of Computer Vision, 2001, 42(3):145-175. [7] OLIVA A, TORRALBA A. Building the gist of a scene: the role of global image features in recognition[J]. Progress in Brain Research, 2006, 155: 23-36. [8] BROWN M, SÜSSTRUNK S. Multi-spectral SIFT for scene category recognition[C] //IEEE Conference on Computer Vision & Pattern Recognition. Piscataway, USA: IEEE, 2011: 177-184. [9] BAY H, ESS A, TUYTELAARS T, et al. Speeded-up robust features(SURF)[J]. Computer Vision and Image Understanding, 2008, 110(3): 346-359. [10] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C] //2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR'05). Piscataway, USA:IEEE, 2005, 1: 886-893. [11] WU J, REHG J M. Centrist: a visual descriptor for scene categorization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 33(8): 1489-1501. [12] ZABIH R, WOODFILL J. Non-parametric local transforms for computing visual correspondence[C] //Computer Vision: ECCV'94: third European Conference on Computer Vision Stockholm: Volume II 3. Berlin, German: Springer, 1994: 151-158. [13] FEICHTENHOFER C, PINZ A, WILDES R P. Space-time forests with complementary features for dynamic scene recognition[C] //British Machine Vision Conference. Berlin, German: Springer, 2013: 6. [14] GANGOPADHYAY A, TRIPATHI S M, JINDAL I, et al. Dynamic scene classification using convolutional neural networks[C] //2016 IEEE Global Conference on Signal and Information Processing(GlobalSIP). Piscataway, USA: IEEE, 2016: 1255-1259. [15] DORETTO G, CHIUSO A, YING N W, et al. Dynamic textures[J]. International Journal of Computer Vision, 2003, 51: 91-109. [16] SHROFF N, TURAGA P, CHELLAPPA R. Moving vistas: exploiting motion for describing scenes[C] // IEEE Conference on Computer Vision & Pattern Recognition. Piscataway, USA: IEEE, 2010: 1911-1918. [17] MARSZALEK M, LAPTEV I, SCHMID C. Actions in context[C] //IEEE Conference on Computer Vision & Pattern Recognition. Piscataway, USA: IEEE, 2009: 2929-2936. [18] VASUDEVAN A B, MURALIDHARAN S, CHINTAPALLI S P, et al. Dynamic scene classification using spatial and temporal cues[C] //Proceedings of the IEEE International Conference on Computer Vision Workshops. Piscataway, USA: IEEE, 2013: 803-810. [19] FEICHTENHOFER C, PINZ A, WILDES R P. Dy-namic scene recognition with complementary spatiotemporal features[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(12):2389-2401. [20] FEICHTENHOFER C, PINZ A, WILDES R P. Bags of spacetime energies for dynamic scene recognition[C] // IEEE Conference on Computer Vision & Pattern Recognition. Piscataway, USA: IEEE, 2014: 2681-2688. [21] DERPANIS K G, LECCE M, DANIILIDIS K, et al. Dynamic scene understanding: the role of orientation features in space and time in scene classification[C] // IEEE Conference on Computer Vision & Pattern Recognition. Piscataway, USA: IEEE, 2012: 1306-1313. [22] DU Liang, LING Haibin. Dynamic scene classification using redundant spatial scenelets[J]. IEEE Transactions on Cybernetics, 2015, 46(9): 2156-2165. [23] THERIAULT C, THOME N, CORD M. Dynamic scene classification: learning motion descriptors with slow features analysis[C] //IEEE Conference on Computer Vision & Pattern Recognition. Piscataway, USA: IEEE, 2013: 2603-2610. [24] WISKOTT L, SEJNOWSKI T J. Slow feature analysis: unsupervised learning of invariances[J]. Neural Computation, 2002, 14(4): 715-770. [25] TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3d convolutional networks[C] // Proceedings of the IEEE International Conference on Computer Vision, 2015: 4489-4497. [26] HUANG Yuanjun, CAO Xianbin, WANG Qi, et al. Long-short-term features for dynamic scene classification[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 29(4): 1038-1047. [27] KRIZHEVSKY A, SUTSKEVER I, HINTON G. ImageNet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012, 25(2): 1097-1105. [28] ZHANG Jianglong, NIE Liqiang, WANG Xiang, et al. Shorter-is-better: venue category estimation from micro-video[C] //Proceedings of the 24th ACM International Conference on Multimedia. New York, USA: ACM, 2016: 1415-1424. [29] NIE Liqiang, WANG Xiang, ZHANG Jianglong, et al. Enhancing micro-video understanding by harnessing external sounds[C] //Proceedings of the 25th ACM International Conference on Multimedia. New York, USA: ACM, 2017: 1192-1200. [30] GRAVES A. Long short-term memory[J]. Supervised Sequence Labelling with Recurrent Neural Networks, 2012, 385: 37-45. [31] LIPTON Z C, BERKOWITZ J, ELKAN C. A critical review of recurrent neural networks for sequence learning[EB/OL].(2015-10-17)[2023-05-18]. https://arxiv.org/abs/1506.00019. [32] ZHOU B, LAPEDRIZA A, KHOSLA A, et al. Places: a 10 million image database for scene recognition[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 40(6): 1452-1464. [33] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL].(2015-05-10)[2023-05-18]. https://arxiv.org/abs/1409.1556. [34] GUO Jie, NIE Xiushan, CUI Chaoran, et al. Getting more from one attractive scene: venue retrieval in micro-videos[C] //Advances in Multimedia Information Processing-PCM 2018: 19th Pacific-Rim Conference on Multimedia. Berlin, German: Springer, 2018: 721-733. [35] GUO Jie, NIE Xiushan, JIAN Muwei, et al. Binary feature representation learning for scene retrieval in micro-video[J]. Multimedia Tools and Applications, 2019, 78: 24539-24552. [36] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C] // IEEE Conference on Computer Vision & Pattern Recognition. Piscataway, USA: IEEE, 2016: 770-778. [37] WEI Yinwei, WANG Xiang, GUAN Weili, et al. Neural multimodal cooperative learning toward micro-video understanding[J]. IEEE Transactions on Image Processing, 2019, 29: 1-14. [38] WANG Bing, HUANG Xianglin, CAO Gang, et al. Hybrid-attention and frame difference enhanced network for micro-video venue recognition[J]. Journal of Intelligent & Fuzzy Systems, 2022, 43(3): 3337-3353. [39] WANG Bing, HUANG Xianglin, CAO Gang, et al. Attention-enhanced and trusted multimodal learning for micro-video venue recognition[J]. Computers and Electrical Engineering, 2022, 102: 108127. [40] EL-NOUBY A, IZACARD G, TOUVRON H, et al. Are large-scale datasets necessary for self-supervised pre-training?[EB/OL].(2021-12-20)[2023-05-18]. https://arxiv.org/abs/2112.10740. [41] VINCENT P, LAROCHELLE H, BENGIO Y, et al. Extracting and composing robust features with denoising autoencoders[C] //Proceedings of the 25th International Conference on Machine Learning. New York, USA: ACM, 2008: 1096-1103. [42] KIROS R, ZHU Y, SALAKHUTDINOV R R, et al. Skip-thought vectors[J]. Advances in Neural Information Processing Systems, 2015, 28: 1-9. [43] ARORA S, LIANG Y, MA T. A simple but tough-to-beat baseline for sentence embeddings[C] //International Conference on Learning Representations. New York, USA: ICML, 2017: 1-16. [44] RONG X. Word2vec parameter learning explained[EB/OL].(2016-06-05)[2023-05-18]. https://arxiv.org/abs/1411.2738. [45] LE Q, MIKOLOV T. Distributed representations of sentences and documents[C] //International Conference on Machine Learning. New York, USA: ACM, 2014: 1188-1196. [46] GUO Jie, NIE Xiushan, MA Yuling, et al. Attention based consistent semantic learning for micro-video scene recognition[J]. Information Sciences, 2021, 543: 504-516. [47] FAN Weiquan, HE Zhiwei, XING Xiaofen, et al. Multi-modality depression detection via multi-scale temporal dilated cnns[C] //Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop. New York, USA: ACM, 2019: 73-80. [48] YIN Shi, LIANG Cong, DING Heyan, et al. A multi-modal hierarchical recurrent neural network for depression detection[C] //Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop. [S.l.] : ACM, 2019: 65-71. [49] RAY A, KUMAR S, REDDY R, et al. Multi-level attention network using text, audio and video for de-pression prediction[C] //Proceedings of the 9th International on Audio/Visual Emotion Challenge and Work-shop[S.l.] : ACM, 2019: 81-88. [50] MENG Hongying, HUANG Di, WANG Heng, et al. Depression recognition based on dynamic facial and vocal expression features using partial least square regression[C] //Proceedings of the 3rd ACM International Workshop on Audio/visual Emotion Challenge. New York, USA: ACM, 2013: 21-30. [51] SAMAREH A, JIN Y, WANG Z, et al. Detect depression from communication: how computer vision, signal processing, and sentiment analysis join forces[J]. IISE Transactions on Healthcare Systems Engineering, 2018, 8(3): 196-208. [52] NIE Weizhi, YAN Yan, SONG Dan, et al. Multi-modal feature fusion based on multi-layers LSTM for video emotion recognition[J]. Multimedia Tools and Applications, 2021, 80: 16205-16214. [53] VERMA S, WANG J, GE Z, et al. Deep-HOSeq: deep higher order sequence fusion for multimodal sentiment analysis[C] //2020 IEEE International Conference on Data Mining(ICDM). Piscataway, USA: IEEE, 2020: 561-570. [54] LIU Meng, NIE Liqiang, WANG Meng, et al. Towards micro-video understanding by joint sequential-sparse modeling[C] //Proceedings of the 25th ACM International Conference on Multimedia. New York, USA: ACM, 2017: 970-978. [55] LIU Meng, NIE Liqiang, WANG Xiang, et al. Online data organizer: micro-video categorization by structure-guided multimodal dictionary learning[J]. IEEE Transactions on Image Processing, 2018, 28(3): 1235-1247. [56] LIU Wei, HUANG Xianglin, CAO Gang, et al. Joint learning of LSTMs-CNN and prototype for micro-video venue classification[C] //Advances in Multimedia Information Processing: PCM 2018: 19th Pacific-Rim Conference on Multimedia. Berlin, German: Springer, 2018: 705-715. [57] LIU Wei, HUANG Xianglin, CAO Gang, et al. Joint learning of nnextvlad, cnn and context gating for micro-video venue classification[J]. IEEE Access, 2019, 7:77091-77099. [58] LIU Wei, HUANG Xianglin, CAO Gang, et al. Multi-modal sequence model with gated fully convolutional blocks for micro-video venue classification[J]. Multimedia Tools and Applications, 2020, 79(9/10): 6709-6726. [59] LI Xin, GUO Yuhong. Multi-level adaptive active learning for scene classification[C] // European Conference on Computer Vision. Berlin, German: Springer, 2014: 234-249. [60] GUO Jie, NIE Xiushan, YIN Yilong. Mutual complementarity: multi-modal enhancement semantic learning for micro-video scene recognition[J]. IEEE Access, 2020, 8: 29518-29524. [61] LU Wei, LI Desheng, NIE Liqiang, et al. Learning dual low-rank representation for multi-label micro-video classification[J]. IEEE Transactions on Multimedia, 2023, 25: 77-89. [62] LU Wei, LIN Jiaxin, JING Peiguang, et al. A multimodal aggregation network with serial self-attention mechanism for micro-video multi-label classification[J]. IEEE Signal Processing Letters, 2023, 30: 60-64. [63] ABU-EL-HAIJA S, KOTHARI N, LEE J, et al. YouTube-8M: a large-scale video classification benchmark[EB/OL].(2016-09-27)[2023-05-18]. https://arxiv.org/abs/1609.08675. |
| [1] | YANG Jucheng, WEI Feng, LIN Liang, JIA Qingxiang, LIU Jianzheng. A research survey of driver drowsiness driving detection [J]. Journal of Shandong University(Engineering Science), 2024, 54(2): 1-12. |
| [2] | XIAO Wei, ZHENG Gengsheng, CHEN Yujia. Named entity recognition method combined with self-training model [J]. Journal of Shandong University(Engineering Science), 2024, 54(2): 96-102. |
| [3] | Gang HU, Lemeng WANG, Zhiyu LU, Qin WANG, Xiang XU. Importance identification method based on multi-order neighborhood hierarchical association contribution of nodes [J]. Journal of Shandong University(Engineering Science), 2024, 54(1): 1-10. |
| [4] | Jiachun LI,Bowen LI,Jianbo CHANG. An efficient and lightweight RGB frame-level face anti-spoofing model [J]. Journal of Shandong University(Engineering Science), 2023, 53(6): 1-7. |
| [5] | Yujiang FAN,Huanhuan HUANG,Jiaxiong DING,Kai LIAO,Binshan YU. Resilience evaluation system of the old community based on cloud model [J]. Journal of Shandong University(Engineering Science), 2023, 53(5): 1-9, 19. |
| [6] | Ying LI,Jiankun WANG. The classification of mild cognitive impairment based on supervised graph regularization and information fusion [J]. Journal of Shandong University(Engineering Science), 2023, 53(4): 65-73. |
| [7] | YU Yixuan, YANG Geng, GENG Hua. Multimodal hierarchical keyframe extraction method for continuous combined motion [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 42-50. |
| [8] | ZHANG Hao, LI Ziling, LIU Tong, ZHANG Dawei, TAO Jianhua. A technology prediction model based on fuzzy Bayesian networks with sociological factors [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 23-33. |
| [9] | WU Yanli, LIU Shuwei, HE Dongxiao, WANG Xiaobao, JIN Di. Poisson-gamma topic model of describing multiple underlying relationships [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 51-60. |
| [10] | YU Mingjun, DIAO Hongjun, LING Xinghong. Online multi-object tracking method based on trajectory mask [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 61-69. |
| [11] | HUANG Huajuan, CHENG Qian, WEI Xiuxi, YU Chuchu. Adaptive crow search algorithm with Jaya algorithm and Gaussian mutation [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 11-22. |
| [12] | LIU Fangxu, WANG Jian, WEI Benzheng. Auxiliary diagnosis algorithm for pediatric pneumonia based on multi-spatial attention [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 135-142. |
| [13] | LIU Xing, YANG Lu, HAO Fanchang. Finger vein image retrieval based on multi-feature fusion [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 118-126. |
| [14] | Yue YUAN,Yanli WANG,Kan LIU. Named entity recognition model based on dilated convolutional block architecture [J]. Journal of Shandong University(Engineering Science), 2022, 52(6): 105-114. |
| [15] | Xiaobin XU,Qi WANG,Bin GAO,Zhiyu SUN,Zhongjun LIANG,Shangguang WANG. Pre-allocation of resources based on trajectory prediction in heterogeneous networks [J]. Journal of Shandong University(Engineering Science), 2022, 52(4): 12-19. |
|
||