Journal of Shandong University(Engineering Science) ›› 2025, Vol. 55 ›› Issue (1): 1-14.doi: 10.6040/j.issn.1672-3961.0.2024.162
• Machine Learning & Data Mining •
NIE Xiushan, ZHAO Runhu, NING Yang*, LIU Xinfeng
CLC Number:
[1] GIRSHICK R, DONAHUE J, DARRELl T, et al. Region-based convolutional networks for accurate object detection and segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 38(1): 142-158. [2] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6): 1137-1149. [3] REDMON J, FARHADI A. Yolov3: An incrementalimprovement[EB/OL].(2018-04-08)[2024-05-28]. https://arxiv.org/abs/1804.02767. [4] LIU W, ANGUELOV D, ERHAN D, et al. Ssd: single shot multibox detector[C] //Proceedings of the Computer Vision-ECCV 2016 Workshops. Berlin, Germany: Springer, 2016: 21-37. [5] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[EB/OL].(2023-08-02)[2024-05-28]. https://arxiv.org/abs/1706.03762. [6] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transforme-rs for image recognition at scale[EB/OL].(2021-06-03)[2024-05-28]. https://arxiv.org/abs/2010.11929. [7] LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C] //Proceedings of the IEEE International Conference on Computer Vision. Piscataway, USA: IEEE, 2021: 10012-10022. [8] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C] //Proc-eedings of the Computer Vision-ECCV 2020 Workshops. Berlin, Germany: Springer, 2020: 213-229. [9] GU A, DAO T. Mamba: linear-time sequence modeling with selective state spaces[EB/OL].(2024-05-31)[2024-06-17]. https://arxiv.org/abs/2312.00752. [10] ZHU L, LIAO B, ZHANG Q, et al. Vision mamba: efficient visual representation learning with bidirect-ional state space model[EB/OL].(2024-02-10)[2024-06-17]. https://arxiv.org/abs/2401.09417. [11] HUANG T, PEI X, YOU S, et al. Localmamba: visual state space model with windowed selective scan[EB/OL].(2024-03-14)[2024-06-17]. https://arxiv.org/abs/2403.09338. [12] ZOU Z, CHEN K, SHI Z, et al. Object detection in 20 years: a survey[J]. Proceedings of the IEEE, 2023, 111(3): 257-276. [13] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft coco: common objects in context[C] //Proceedings of the Computer Vision-ECCV 2014 Workshops. Berlin, Germany: Springer, 2014: 740-755. [14] GUPTA A, DOLLAR P, GIRSHICK R. Lvis: a da-taset for large vocabulary instance segmentation[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2019: 5356-5364. [15] SCHEIRER W J, DE REZENDE ROCHA A, SAPKOTA A, et al. Toward open set recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 35(7): 1757-1772. [16] KANG B, LIU Z, WANG X, et al. Few-shot object detection via feature reweighting[C] //Proceedings of the IEEE International Conference on Computer Vision. Piscataway, USA: IEEE, 2019: 8420-8429. [17] BANSAL A, SIKKA K, SHARMA G, et al. Zero-shot object detection[C] //Proceedings of the Computer Vision- ECCV 2018 Workshops. Berlin, Germany: Springer, 2018: 384-400. [18] ZHU P, WANG H, SALIGRAMA V. Zero shot detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 30(4): 998-1010. [19] DEVLIN J, CHANG M, LEE K, et al. Bert: pretr-aining of deep bidirectional transformers for language understanding[EB/OL].(2019-05-24)[2024-06-17].https://arxiv.org/abs/1810.04805. [20] ZAREIAN A, ROSA K D, HU D H, et al. Open vocabulary object detection using captions[EB/OL].(2021-05-14)[2024-06-17]. https://arxiv.org/abs/2011.10678. [21] WU J, LI X, XU S, et al. Towards open vocabulary learning: a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(7): 5092-5113. [22] GENG C, HUANG S, CHEN S. Recent advances in open set recognition: a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 43(10): 3614-3631. [23] JOSEPH K J, KHAN S, KHAN F S, et al. Towards open world object detection[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2021: 5830-5840. [24] ROMERA-PAREDES B, TORR P. An embarrassingly simple approach to zero-shot learning[C] //Proceedings of the 32nd International Conference on Machine Learning. New York, USA: ACM, 2015: 2152-2161. [25] WANG Y, YAO Q, KWOK J T, et al. Generalizing from a few examples: a survey on few-shot learning[J]. ACM Computing Surveys, 2020, 53(3): 1-34. [26] LI L H, YATSKAR M, YIN D, et al. Visualbert: a simple and performant baseline for vision and language[EB/OL].(2019-08-09)[2024-06-17]. https://arxiv.org/abs/1908.03557. [27] LU J, BATRA D, PARIKH D, et al. Vilbert: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks[EB/OL].(2019-08-06)[2024-06-17]. https://arxiv.org/abs/1908.02265. [28] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C] //Proceedings of the 38th International Conference on Machine Learning. New York, USA: ACM, 2021: 8748-8763. [29] MU N, KIRILLOV A, WAGNER D, et al. Slip: self-supervision meets language-image pre-training[C] //Proceedings of the Computer Vision-ECCV 2022 Workshops. Berlin, Germany: Springer, 2022: 529-544. [30] SUN Q, FANG Y, WU L, et al. Evaclip: improved training techniques for clip at scale[EB/OL].(2023-03-27)[2024-06-17]. https://arxiv.org/abs/2011.10678. [31] LI Y, FAN H, HU R, et al. Scaling language-image pretraining via masking[C] //Proceedings of the I-EEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2023: 23390-23400. [32] WU S, ZHANG W, XU L, et al. Clipself: vision-transformer distills itself for open-vocabulary dense prediction[EB/OL].(2024-01-24)[2024-06-17]. https://arxiv.org/abs/2310.01403. [33] GU X, LIN T Y, KUO W, et al. Open-vocabulary object detection via vision and language knowledgedistillation[EB/OL].(2022-05-12)[2024-06-17]. https://arxiv.org/abs/2104.13921. [34] BRAVO M A, MITTAL S, BROX T. Localized vision-language matching for open-vocabulary object detection[C] //Proceedings of the Pattern Recognition: 44th DAGM German Conference. Berlin, Germany: Springer, 2022: 393-408. [35] CHEN P, SHENG K, ZHANG M, et al. Open vocabulary object detection with proposal mining and prediction equalization[EB/OL].(2022-11-24)[2024-06-17]. https://arxiv.org/abs/2206.11134. [36] KIM D, ANGELOVA A, KUO W. Region-aware pretraining for open-vocabulary object detection with vision transformers[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2023: 11144-11154. [37] LIN C, SUN P, JIANG Y, et al. Learning object-language alignments for open-vocabulary object detection[EB/OL].(2022-11-27)[2024-06-17]. https://arxiv.org/abs/2211.14843. [38] KIM D, ANGELOVA A, KUO W. Detection-oriented image-text pretraining for open-vocabulary detec-tion[EB/OL].(2023-09-29)[2024-06-17]. https://arxiv.org/abs/2310.00161v1. [39] MA C, JIANG Y, WEN X, et al. Codet: co-occurrence guided region-word alignment for open-vocabulary object detection[J]. Advances in Neural Information Processing Systems, 2024, 36: 71078-71094. [40] ZHOU X, GIRDHAR R, JOULIN A, et al. Detecting twenty-thousand classes using image-level supe-rvision[C] //Proceedings of the Computer Vision-ECCV 2022 Workshops. Berlin, Germany: Springer, 2022: 350-368. [41] ZHONG Y, YANG J, ZHANG P, et al. Regionclip: region-based language-image pretraining[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2022: 16793-16803. [42] ZHAO S, ZHANG Z, SCHULTER S, et al. Exploiting unlabeled data with vision and language models for object detection[C] //Proceedings of the Computer Vision-ECCV 2022 Workshops. Berlin, Germany: Springer, 2022: 159-175. [43] GAO M, XING C, NIEBLES J C, et al. Open vocabulary object detection with pseudo bounding-box labels[C] //Proceedings of the Computer Vision-ECCV 2022 Workshops. Berlin, Germany: Springer, 2022: 266-282. [44] XU S, LI X, WU S, et al. Dst-det: simple dynamic self-training for open-vocabulary object detection[EB/OL].(2024-04-01)[2024-06-17]. https://arxiv.org/abs/2310.01393. [45] WANG L, LIU Y, DU P, et al. Object-aware distil-lation pyramid for open-vocabulary object detection[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2023: 11186-11196. [46] MA Z, LUO G, GAO J, et al. Open-vocabulary one-stage detection with hierarchical visual-language knowledge distillation[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2022: 14074-14083. [47] ZANG Y, LI W, ZHOU K, et al. Open-vocabulary detr with conditional matching[C] //Proceedings of the Computer Vision-ECCV 2022 Workshops. Berlin, Germany: Springer, 2022: 106-122. [48] WU S, ZHANG W, JIN S, et al. Aligning bag of regions for open-vocabulary object detection[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2023: 15254-15264. [49] CHO H C, JHOO W Y, KANG W, et al. Open-vocabulary object detection using pseudo caption la-bels[EB/OL].(2023-03-23)[2024-06-17]. https://arxiv.org/abs/2303.13040. [50] KUO W, CUI Y, GU X, et al. Fvlm: open-vocabulary object detection upon frozen vision and lang-uage models[EB/OL].(2023-02-23)[2024-06-17]. https://arxiv.org/abs/2209.15639. [51] MINDERER M, GRITSENKO A, STONE A, et al.Simple open-vocabulary object detection[C] //Procee-dings of the Computer Vision-ECCV 2022 Workshops. Berlin, Germany: Springer, 2022: 728-755. [52] DU Y, WEI F, ZHANG Z, et al. Learning to prompt for open-vocabulary object detection with vision-language model[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2022: 14084-14093. [53] FENG C, ZHONG Y, JIE Z, et al. Promptdet: tow-ards open-vocabulary detection using uncurated images[C] //Proceedings of the Computer Vision-ECCV 2022 Workshops. Berlin, Germany: Springer, 2022: 701-717. [54] SONG H, BANG J. Prompt-guided transformers forend-to-end open-vocabulary object detection[EB/OL].(2023-03-25)[2024-06-17]. https://arxiv.org/abs/2303.14386. [55] WU X, ZHU F, ZHAO R, et al. CORA: adapting clip for open-vocabulary detection with region prompting and anchor prematching[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2023: 7031-7040. [56] LI J, ZHANG J, LI J, et al. Learning background prompts to discover implicit knowledge for open vocabulary object detection[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2024: 16678-16687. [57] WANG J, ZHANG P, CHU T, et al. V3det: vast vocabulary visual detection dataset[EB/OL].(2023-10-05)[2024-06-17]. https://arxiv.org/abs/2304.03752. [58] EVERINGHAM M, ESLAMI S M, GOOL L, et al. The pascal visual object classes challenge: a retrospective[J]. International Journal of Computer Vision, 2015, 111(1): 98-136. [59] SHAO S, LI Z, ZHANG T, et al. Objects365: a large-scale, high-quality dataset for object detection[C] //Proceedings of the IEEE International Conference on Computer Vision. Piscataway, USA: IEEE, 2019: 8430-8439. [60] WANG Y, SU X, CHEN Q, et al. OVLW-DETR: open-vocabulary light-weighted detection transformer[EB/OL].(2024-07-15)[2024-07-26]. https://arxiv.org/abs/2407.10655. [61] GAO K, CHEN L, ZHANG H, et al. Composition-al prompt tuning with motion cues for open-vocabulary video relation detection[EB/OL].(2023-02-01)[2024-06-17]. https://arxiv.org/abs/2302.00268. [62] LI L, XIAO J, CHEN G, et al. Zero-shot visual relation detection via composite visual cues from large language models[J]. Advances in Neural Information Processing Systems, 2024, 36: 50105-50116. [63] ZHU C, CHEN L. A survey on open-vocabulary detection and segmentation: past, present, and future[EB/OL].(2024-04-15)[2024-06-17]. https://arxiv.org/abs/2307.09220. |
[1] | LI Erchao, ZHANG Zhizhao. Online dynamic demand vehicle routing planning [J]. Journal of Shandong University(Engineering Science), 2024, 54(5): 62-73. |
[2] | YANG Jucheng, WEI Feng, LIN Liang, JIA Qingxiang, LIU Jianzheng. A research survey of driver drowsiness driving detection [J]. Journal of Shandong University(Engineering Science), 2024, 54(2): 1-12. |
[3] | XIAO Wei, ZHENG Gengsheng, CHEN Yujia. Named entity recognition method combined with self-training model [J]. Journal of Shandong University(Engineering Science), 2024, 54(2): 96-102. |
[4] | Gang HU, Lemeng WANG, Zhiyu LU, Qin WANG, Xiang XU. Importance identification method based on multi-order neighborhood hierarchical association contribution of nodes [J]. Journal of Shandong University(Engineering Science), 2024, 54(1): 1-10. |
[5] | Jiachun LI,Bowen LI,Jianbo CHANG. An efficient and lightweight RGB frame-level face anti-spoofing model [J]. Journal of Shandong University(Engineering Science), 2023, 53(6): 1-7. |
[6] | Yujiang FAN,Huanhuan HUANG,Jiaxiong DING,Kai LIAO,Binshan YU. Resilience evaluation system of the old community based on cloud model [J]. Journal of Shandong University(Engineering Science), 2023, 53(5): 1-9, 19. |
[7] | Ying LI,Jiankun WANG. The classification of mild cognitive impairment based on supervised graph regularization and information fusion [J]. Journal of Shandong University(Engineering Science), 2023, 53(4): 65-73. |
[8] | YU Yixuan, YANG Geng, GENG Hua. Multimodal hierarchical keyframe extraction method for continuous combined motion [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 42-50. |
[9] | ZHANG Hao, LI Ziling, LIU Tong, ZHANG Dawei, TAO Jianhua. A technology prediction model based on fuzzy Bayesian networks with sociological factors [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 23-33. |
[10] | WU Yanli, LIU Shuwei, HE Dongxiao, WANG Xiaobao, JIN Di. Poisson-gamma topic model of describing multiple underlying relationships [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 51-60. |
[11] | YU Mingjun, DIAO Hongjun, LING Xinghong. Online multi-object tracking method based on trajectory mask [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 61-69. |
[12] | HUANG Huajuan, CHENG Qian, WEI Xiuxi, YU Chuchu. Adaptive crow search algorithm with Jaya algorithm and Gaussian mutation [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 11-22. |
[13] | LIU Fangxu, WANG Jian, WEI Benzheng. Auxiliary diagnosis algorithm for pediatric pneumonia based on multi-spatial attention [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 135-142. |
[14] | LIU Xing, YANG Lu, HAO Fanchang. Finger vein image retrieval based on multi-feature fusion [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 118-126. |
[15] | Yue YUAN,Yanli WANG,Kan LIU. Named entity recognition model based on dilated convolutional block architecture [J]. Journal of Shandong University(Engineering Science), 2022, 52(6): 105-114. |
Viewed | ||||||||||||||||||||||||||||||||||||||||||||||
Full text 83
|
|
|||||||||||||||||||||||||||||||||||||||||||||
Abstract 148
|
|
|||||||||||||||||||||||||||||||||||||||||||||
Cited |
|
|||||||||||||||||||||||||||||||||||||||||||||
Shared | ||||||||||||||||||||||||||||||||||||||||||||||
Discussed |
|