Journal of Shandong University(Engineering Science) ›› 2026, Vol. 56 ›› Issue (1): 63-71.doi: 10.6040/j.issn.1672-3961.0.2024.177

• Machine Learning & Data Mining • Previous Articles    

Path exploration strategy for symbolic execution based on active learning

ZHU Dongqing1, ZHANG Junwen1, HE Lianying1, WANG Rui1, LIU Jiqiang1,2, ZHANG Dalin2,3*   

  1. ZHU Dongqing1, ZHANG Junwen1, HE Lianying1, WANG Rui1, LIU Jiqiang1, 2, ZHANG Dalin2, 3*(1. School of Computer and Information Technology of Beijing Jiaotong University, Beijing 100044;
    2. Intelligent Systems and Security Laboratory of Beijing Jiaotong University, Beijing 100044;
    3. Cyberspace Security Institute of Beijing Jiaotong University, Beijing 100044
  • Published:2026-02-03

Abstract: To address the issue of symbolic execution path explosion, a path exploration strategy for symbolic execution based on active learning(ALS)was designed.In this study, the reward values of all states in the test state pool were predicted. The states with accurately predicted high reward values were automatically annotated and fed back to the model to update the prediction model. Based on the model's performance and prediction accuracy, a decision was made on whether to proceed to the next round of testing. After the final round of model updates, some of the prediction models automatically underwent iterative training to generate multiple sub-models, which were then collectively employed for symbolic execution path exploration.The experimental results demonstrated that under the same test conditions, higher code coverage and branch coverage were often achieved by ALS compared to other baseline methods. Moreover, more undefined behavior violations and real-world program vulnerabilities were discovered, and test cases of higher quality were generated. When these test cases were utilized as initial seeds for fuzz testing, more program paths could be discovered.

Key words: active learning, symbolic execution, path exploration strategy, reward values

CLC Number: 

  • TP391
[1] 吴皓, 周世龙, 史东辉, 等. 符号执行技术及应用研究综述[J]. 计算机工程与应用, 2023, 59(8): 56-72. WU Hao, ZHOU Shilong, SHI Donghui, et al. Review of symbolic execution technology and applications[J]. Computer Engineering and Applications, 2023, 59(8): 56-72.
[2] CADAR C, SEN K. Symbolic execution for software testing: three decades later[J]. Communications of the ACM, 2013, 56(2): 82-90.
[3] 陈德蕾, 王成, 陈建伟, 等. 基于门控循环单元与主动学习的协同过滤推荐算法[J]. 山东大学学报(工学版), 2020, 50(1): 21-27. CHEN Delei, WANG Cheng, CHEN Jianwei, et al. GRU-based collaborative filtering recommendation algorithm with active learning[J]. Journal of Shandong University(Engineering Science), 2020, 50(1): 21-27.
[4] 龚楷伦, 翟婷婷, 唐鸿成. 一种面向多标签分类的在线主动学习算法[J].山东大学学报(工学版), 2022, 52(2): 80-88. GONG Kailun, ZHAI Tingting, TANG Hongcheng. An online active learning algorithm for multi-label classification[J]. Journal of Shandong University(Engineering Science), 2022, 52(2): 80-88.
[5] BALDONI R, COPPA E, D'ELIA D C, et al. A survey of symbolic execution techniques[J]. ACM Computing Surveys(CSUR), 2019, 51(3): 1-39.
[6] AVGERINOS T, CHA S K, REBERT A, et al. Automatic exploit generation[J]. Communications of the ACM, 2014, 57(2): 74-84.
[7] MA K K, YIT PHANG K, FOSTER J S, et al. Directed symbolic execution[C] //Static Analysis: 18th Interna-tional Symposium. Venice, Italy: Springer-Verlag, 2011: 95-111.
[8] ZHANG Y F, CHEN Z B, WANG J, et al. Regular property guided dynamic symbolic execution[C] //Proceedings of the 37thInternational Conference on Software Engineering. Florence, Italy: IEEE, 2015: 643-653.
[9] CADAR C, DUNBAR D, ENGLER D R. Klee: unassisted and automatic generation of high-coverage tests for complex systems programs[C] //8th USENIX Symposium on Operating Systems Design and Implementation. San Diego, USA: USENIX, 2008: 209-224.
[10] GODEFROID P. Compositional dynamic test generation[C] //Proceedings of the 34th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming languages. Nice, USA: ACM, 2007: 47-54.
[11] ANAND S, GODEFROID P, TILLMANN N. Demand-driven compositional symbolic execution[C] //Tools and Algorithms for the Construction and Analysis of Systems: 14th International Conference, TACAS 2008. Berlin, Germany: Springer, 2008: 367-381.
[12] GODEFROID P, LUCHAUP D. Automatic partial loop summarization in dynamic test generation[C] //Proceedings of the 2011 International Symposium on Software Testing and Analysis. New York, USA: ACM, 2011: 23-33.
[13] XIE X F, CHEN B H, LIU Y, et al. Proteus: computing disjunctive loop summary via path dependency analysis[C] //Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. New York, USA: ACM, 2016: 61-72.
[14] KINDER J. Efficient symbolic execution for software testing[C] //FMCAD. Lausanne, Switzerland: IEEE, 2014: 5.
[15] 邓维, 李兆鹏. 形状分析符号执行引擎中的状态合并[J]. 计算机科学, 2017, 44(2): 209-215. DENG Wei, LI Zhaopeng. State merging in shape analysis symbolic execution engine[J]. Computer Science, 2017, 44(2): 209-215.
[16] KHOO Y P, CHANG B Y E, FOSTER J S. Mixing type checking and symbolic execution[C] //Proceedings of the 31st ALM SIGPLAN Conference on Programming Language Design and Implementation. New York, USA: ACM, 2010: 436-447.
[17] SHOSHITAISHVILI Y, WANG R Y, HAUSER C, et al. Firmalice-automatic detection of authentication bypass vulnerabilities in binary firmware[C] //NDSS 2015.San Diego,USA:Internet Society Press, 2015: 8-11.
[18] CHA S K, AVGERINOS T, REBERT A, et al. Unleashing mayhem on binary code[C] //2012 IEEE Symposium on Security and Privacy. San Francisco, USA:IEEE, 2012: 380-394.
[19] GAO F J, WANG L Z, LI X D. Bovinspector: automatic inspection and repair of buffer overflow vulnerabilities[C] //Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. New York, USA: ACM, 2016: 786-791.
[20] HE J X, SIVANRUPAN G, TSANKOV P, et al. Learning to explore paths for symbolic execution[C] //Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. New York, USA: ACM, 2021: 2526-2540.
[21] CHA S, HONG S, LEE J, et al. Automatically generating search heuristics for concolic testing[C] //Proceedings of the 40th International Conference on Software Engineering. New York, USA: ACM, 2018: 1244-1254.
[22] CHA S, OH H. Concolic testing with adaptively changing search heuristics[C] //Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.New York, USA: ACM, 2019: 235-245.
[23] LI X, LIANG Y J, QIAN H, et al. Symbolic execution of complex program driven by machine learning based constraint solving[C] //Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering.New York, USA:ACM, 2016: 554-559.
[24] CHA S, LEE S, OH H. Template-guided concolic testing via online learning[C] //Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. New York, USA: ACM, 2018: 408-418.
[25] Settles B. Active learning literature survey[R]. University of Wisconsin-Madison, USA: Computer SciencesDepartment, 2009.
[26] WU D R. Pool-based sequential active learning for regression[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 30(5): 1348-1359.
[27] BELUCH W H, GENEWEIN T, NÜRNBERGER A, et al. The power of ensembles for active learning in image classification[C] //2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City, USA: IEEE, 2018: 9368-9377.
[28] LI Y, SU Z D, WANG L Z, et al. Steering symbolic execution to less traveled paths[C] //Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications. Indianapolis, USA: ACM, 2013: 19-32.
[29] SVOZIL D, KVASNICKA V, POSPICHAL J. Introduction to multi-layer feed-forward neural networks[J]. Chemometrics and Intelligent Laboratory Systems, 1997, 39(1): 43-62.
[30] LIPTON Z C, BERKOWITZ J, ELKAN C. A critical review of recurrent neural networks for sequence learning[EB/OL].(2015-05-29)[2024-07-22]. https://arxiv.org/abs/1506.00019
[31] SAHA S, RAGHAVA G P S. Prediction of continuous B-cell epitopes in an antigen using recurrent neural network[J]. Proteins: Structure, Function, and Bioinformatics, 2006, 65(1): 40-48.
[32] BUSSE F, NOWACK M, CADAR C. Running symbolic execution forever[C] //Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis.New York, USA: ACM, 2020: 63-74.
[33] BURNIM J, SEN K. Heuristics for scalable dynamic test generation[C] //2008 23rd IEEE/ACM International Conference on Automated Software Engineering. L'Aquila, Italy: IEEE, 2008: 443-446.
[34] OGNAWALA S, HUTZELMANN T, PSALLIDA E, et al. Improving function coverage with munch: a hybrid fuzzing and directed symbolic execution approach[C] //Proceedings of the 33rd Annual ACM Symposium on Applied Computing.Pau, France:ACM, 2018: 1475-1482.
[35] HE L Y, ZHANG D L, ZHU D Q, et al. Path exploration strategy for symbolic execution based on multi-strategy active learning[C] // Proceedings of the 15th Asia-Pacific Symposium on Internetware. New York, USA: ACM, 2024: 165-168.
[1] XIE Ziqi, WANG Lihong, LI Man. Active learning of pairwise constraints in block diagonal subspace clustering [J]. Journal of Shandong University(Engineering Science), 2021, 51(2): 65-73.
[2] Delei CHEN, Cheng WANG, Jianwei CHEN, Yiyin WU. GRU-based collaborative filtering recommendation algorithm with active learning [J]. Journal of Shandong University(Engineering Science), 2020, 50(1): 21-27.
[3] XIE Huo-sheng, LIU Min. An ensemble co-training algorithm based on active learning [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2012, 42(3): 1-5.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!