您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2026, Vol. 56 ›› Issue (1): 63-71.doi: 10.6040/j.issn.1672-3961.0.2024.177

• 机器学习与数据挖掘 • 上一篇    

基于主动学习的符号执行路径探索策略

朱东晴1,张骏温1,何莲英1,王睿1,刘吉强1,2,张大林2,3*   

  1. 1.北京交通大学计算机与信息技术学院, 北京 100044;2.北京交通大学智能系统与安全实验室, 北京 100044;3.北京交通大学网络空间安全学院, 北京 100044
  • 发布日期:2026-02-03
  • 作者简介:朱东晴(1999— ),女,江西九江人,硕士研究生,主要研究方向为机器学习与软件测试. E-mail: 2421340087@qq.com. *通信作者简介:张大林(1983— ),男,内蒙古赤峰人,副教授,博士生导师,博士,主要研究方向为软件工程理论与技术. E-mail: dalin@bjtu.edu.cn

Path exploration strategy for symbolic execution based on active learning

ZHU Dongqing1, ZHANG Junwen1, HE Lianying1, WANG Rui1, LIU Jiqiang1,2, ZHANG Dalin2,3*   

  1. ZHU Dongqing1, ZHANG Junwen1, HE Lianying1, WANG Rui1, LIU Jiqiang1, 2, ZHANG Dalin2, 3*(1. School of Computer and Information Technology of Beijing Jiaotong University, Beijing 100044;
    2. Intelligent Systems and Security Laboratory of Beijing Jiaotong University, Beijing 100044;
    3. Cyberspace Security Institute of Beijing Jiaotong University, Beijing 100044
  • Published:2026-02-03

摘要: 针对符号执行路径爆炸问题,设计一种基于主动学习引导的符号执行路径探索策略(path exploration strategy for symbolic execution based on active learning, ALS)。预测待测状态池中所有状态奖励值,自动标注预测准确的高奖励值状态,反馈给模型以更新预测模型;根据模型性能和预测精度判断是否进入下一轮测试;最后一轮模型更新结束后,部分预测模型将自动进行迭代训练,生成多个子模型,共同用于符号执行的路径探索。试验结果表明,在相同试验条件下,ALS相较其他基线方法往往能够实现更高代码覆盖和分支覆盖率,发现更多未定义行为违规数和真实世界程序漏洞,且其生成测试用例质量较高,用作模糊测试初始种子能发现更多程序路径。

关键词: 主动学习, 符号执行, 路径探索策略, 奖励值

Abstract: To address the issue of symbolic execution path explosion, a path exploration strategy for symbolic execution based on active learning(ALS)was designed.In this study, the reward values of all states in the test state pool were predicted. The states with accurately predicted high reward values were automatically annotated and fed back to the model to update the prediction model. Based on the model's performance and prediction accuracy, a decision was made on whether to proceed to the next round of testing. After the final round of model updates, some of the prediction models automatically underwent iterative training to generate multiple sub-models, which were then collectively employed for symbolic execution path exploration.The experimental results demonstrated that under the same test conditions, higher code coverage and branch coverage were often achieved by ALS compared to other baseline methods. Moreover, more undefined behavior violations and real-world program vulnerabilities were discovered, and test cases of higher quality were generated. When these test cases were utilized as initial seeds for fuzz testing, more program paths could be discovered.

Key words: active learning, symbolic execution, path exploration strategy, reward values

中图分类号: 

  • TP391
[1] 吴皓, 周世龙, 史东辉, 等. 符号执行技术及应用研究综述[J]. 计算机工程与应用, 2023, 59(8): 56-72. WU Hao, ZHOU Shilong, SHI Donghui, et al. Review of symbolic execution technology and applications[J]. Computer Engineering and Applications, 2023, 59(8): 56-72.
[2] CADAR C, SEN K. Symbolic execution for software testing: three decades later[J]. Communications of the ACM, 2013, 56(2): 82-90.
[3] 陈德蕾, 王成, 陈建伟, 等. 基于门控循环单元与主动学习的协同过滤推荐算法[J]. 山东大学学报(工学版), 2020, 50(1): 21-27. CHEN Delei, WANG Cheng, CHEN Jianwei, et al. GRU-based collaborative filtering recommendation algorithm with active learning[J]. Journal of Shandong University(Engineering Science), 2020, 50(1): 21-27.
[4] 龚楷伦, 翟婷婷, 唐鸿成. 一种面向多标签分类的在线主动学习算法[J].山东大学学报(工学版), 2022, 52(2): 80-88. GONG Kailun, ZHAI Tingting, TANG Hongcheng. An online active learning algorithm for multi-label classification[J]. Journal of Shandong University(Engineering Science), 2022, 52(2): 80-88.
[5] BALDONI R, COPPA E, D'ELIA D C, et al. A survey of symbolic execution techniques[J]. ACM Computing Surveys(CSUR), 2019, 51(3): 1-39.
[6] AVGERINOS T, CHA S K, REBERT A, et al. Automatic exploit generation[J]. Communications of the ACM, 2014, 57(2): 74-84.
[7] MA K K, YIT PHANG K, FOSTER J S, et al. Directed symbolic execution[C] //Static Analysis: 18th Interna-tional Symposium. Venice, Italy: Springer-Verlag, 2011: 95-111.
[8] ZHANG Y F, CHEN Z B, WANG J, et al. Regular property guided dynamic symbolic execution[C] //Proceedings of the 37thInternational Conference on Software Engineering. Florence, Italy: IEEE, 2015: 643-653.
[9] CADAR C, DUNBAR D, ENGLER D R. Klee: unassisted and automatic generation of high-coverage tests for complex systems programs[C] //8th USENIX Symposium on Operating Systems Design and Implementation. San Diego, USA: USENIX, 2008: 209-224.
[10] GODEFROID P. Compositional dynamic test generation[C] //Proceedings of the 34th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming languages. Nice, USA: ACM, 2007: 47-54.
[11] ANAND S, GODEFROID P, TILLMANN N. Demand-driven compositional symbolic execution[C] //Tools and Algorithms for the Construction and Analysis of Systems: 14th International Conference, TACAS 2008. Berlin, Germany: Springer, 2008: 367-381.
[12] GODEFROID P, LUCHAUP D. Automatic partial loop summarization in dynamic test generation[C] //Proceedings of the 2011 International Symposium on Software Testing and Analysis. New York, USA: ACM, 2011: 23-33.
[13] XIE X F, CHEN B H, LIU Y, et al. Proteus: computing disjunctive loop summary via path dependency analysis[C] //Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. New York, USA: ACM, 2016: 61-72.
[14] KINDER J. Efficient symbolic execution for software testing[C] //FMCAD. Lausanne, Switzerland: IEEE, 2014: 5.
[15] 邓维, 李兆鹏. 形状分析符号执行引擎中的状态合并[J]. 计算机科学, 2017, 44(2): 209-215. DENG Wei, LI Zhaopeng. State merging in shape analysis symbolic execution engine[J]. Computer Science, 2017, 44(2): 209-215.
[16] KHOO Y P, CHANG B Y E, FOSTER J S. Mixing type checking and symbolic execution[C] //Proceedings of the 31st ALM SIGPLAN Conference on Programming Language Design and Implementation. New York, USA: ACM, 2010: 436-447.
[17] SHOSHITAISHVILI Y, WANG R Y, HAUSER C, et al. Firmalice-automatic detection of authentication bypass vulnerabilities in binary firmware[C] //NDSS 2015.San Diego,USA:Internet Society Press, 2015: 8-11.
[18] CHA S K, AVGERINOS T, REBERT A, et al. Unleashing mayhem on binary code[C] //2012 IEEE Symposium on Security and Privacy. San Francisco, USA:IEEE, 2012: 380-394.
[19] GAO F J, WANG L Z, LI X D. Bovinspector: automatic inspection and repair of buffer overflow vulnerabilities[C] //Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. New York, USA: ACM, 2016: 786-791.
[20] HE J X, SIVANRUPAN G, TSANKOV P, et al. Learning to explore paths for symbolic execution[C] //Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. New York, USA: ACM, 2021: 2526-2540.
[21] CHA S, HONG S, LEE J, et al. Automatically generating search heuristics for concolic testing[C] //Proceedings of the 40th International Conference on Software Engineering. New York, USA: ACM, 2018: 1244-1254.
[22] CHA S, OH H. Concolic testing with adaptively changing search heuristics[C] //Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.New York, USA: ACM, 2019: 235-245.
[23] LI X, LIANG Y J, QIAN H, et al. Symbolic execution of complex program driven by machine learning based constraint solving[C] //Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering.New York, USA:ACM, 2016: 554-559.
[24] CHA S, LEE S, OH H. Template-guided concolic testing via online learning[C] //Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. New York, USA: ACM, 2018: 408-418.
[25] Settles B. Active learning literature survey[R]. University of Wisconsin-Madison, USA: Computer SciencesDepartment, 2009.
[26] WU D R. Pool-based sequential active learning for regression[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 30(5): 1348-1359.
[27] BELUCH W H, GENEWEIN T, NÜRNBERGER A, et al. The power of ensembles for active learning in image classification[C] //2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City, USA: IEEE, 2018: 9368-9377.
[28] LI Y, SU Z D, WANG L Z, et al. Steering symbolic execution to less traveled paths[C] //Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications. Indianapolis, USA: ACM, 2013: 19-32.
[29] SVOZIL D, KVASNICKA V, POSPICHAL J. Introduction to multi-layer feed-forward neural networks[J]. Chemometrics and Intelligent Laboratory Systems, 1997, 39(1): 43-62.
[30] LIPTON Z C, BERKOWITZ J, ELKAN C. A critical review of recurrent neural networks for sequence learning[EB/OL].(2015-05-29)[2024-07-22]. https://arxiv.org/abs/1506.00019
[31] SAHA S, RAGHAVA G P S. Prediction of continuous B-cell epitopes in an antigen using recurrent neural network[J]. Proteins: Structure, Function, and Bioinformatics, 2006, 65(1): 40-48.
[32] BUSSE F, NOWACK M, CADAR C. Running symbolic execution forever[C] //Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis.New York, USA: ACM, 2020: 63-74.
[33] BURNIM J, SEN K. Heuristics for scalable dynamic test generation[C] //2008 23rd IEEE/ACM International Conference on Automated Software Engineering. L'Aquila, Italy: IEEE, 2008: 443-446.
[34] OGNAWALA S, HUTZELMANN T, PSALLIDA E, et al. Improving function coverage with munch: a hybrid fuzzing and directed symbolic execution approach[C] //Proceedings of the 33rd Annual ACM Symposium on Applied Computing.Pau, France:ACM, 2018: 1475-1482.
[35] HE L Y, ZHANG D L, ZHU D Q, et al. Path exploration strategy for symbolic execution based on multi-strategy active learning[C] // Proceedings of the 15th Asia-Pacific Symposium on Internetware. New York, USA: ACM, 2024: 165-168.
[1] 龚楷伦,翟婷婷,唐鸿成. 一种面向多标签分类的在线主动学习算法[J]. 山东大学学报 (工学版), 2022, 52(2): 80-88.
[2] 陈德蕾, 王成, 陈建伟, 吴以茵. 基于门控循环单元与主动学习的协同过滤推荐算法[J]. 山东大学学报 (工学版), 2020, 50(1): 21-27.
[3] 谢伙生,刘敏. 一种基于主动学习的集成协同训练算法[J]. 山东大学学报(工学版), 2012, 42(3): 1-5.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!