山东大学学报 (工学版) ›› 2020, Vol. 50 ›› Issue (2): 118-128.doi: 10.6040/j.issn.1672-3961.0.2019.043
• 机器学习与数据挖掘 • 上一篇
Haijun ZHANG1(),Yinghui CHEN2,*()
摘要:
基于语义情景分析及向量化对访问流量语料库大数据进行词向量化处理,实现面向大数据跨站脚本攻击智能检测。利用自然语言处理方法进行数据获取、数据清洗、数据抽样、特征提取等数据预处理。设计基于神经网络的词向量化算法,实现词向量化得到词向量大数据;通过理论分析和推导,实现多种不同深度的长短时记忆网络智能检测算法。设计不同的超参数并进行反复试验,分别得到最大识别率为0.999 5、最低识别率为0.264 3、识别率均值为99.88%、方差为0、标准差为0.000 4的识别率变化过程曲线图、损失误差变化过程曲线图、词向量样本余弦距离变化曲线图和平均绝对误差变化过程曲线图等。研究结果表明该算法有高识别率、稳定性强、总体性能优良等优点。
中图分类号:
1 | NAIR Akhil . Prevention of cross site scripting (XSS) and securing web application atclient side[J]. International Journal of Emerging Technology and Computer Science, 2018, 3 (2): 83- 86. |
2 | RODRIGUEZ G E, BENAVIDES D E, TORRES J, et al. Cookie scout: an analytic model for prevention of cross-site scripting (XSS) using a cookie classifier[C]//Proceedings of the International Conference on Information Technology & Systems. Berlin, Germany: Springer Cham Press, 2018: 497-507. |
3 | XU K, GUO S, CAO N, et al. ECGLens: interactive visual exploration of large scale ECG data for arrhythmia detection[C]//Proceedings of the ACM CHI Conference on Human Factors in Computing Systems. Chicago, USA: ACM Press, 2018: 1-12. |
4 |
KAHNG M , ANDREWS P Y , KALRO A , et al. ActiVis: visual exploration of industry-scale deep neural network models[J]. IEEE Trans. Visualization and Computer Graphics, 2018, 24 (1): 88- 97.
doi: 10.1109/TVCG.2017.2744718 |
5 |
LIU M , SHI J , CAO K , et al. Analyzing the training processes of deep generative models[J]. IEEE Trans. Visualization and Computer Graphics, 2018, 24 (1): 77- 87.
doi: 10.1109/TVCG.2017.2744938 |
6 |
ZHANG Haijun , XIAO Nanfeng . Parallel implementation of multilayered neural networks based on Map-Reduce on cloud computing clusters[J]. Soft Computing, 2016, 20 (4): 1471- 1483.
doi: 10.1007/s00500-015-1599-3 |
7 | LI Yuanzhi, LIANG Yingyu. Learning overparameterized neural networks via stochastic gradient descent on structured data[EB/OL]. (2018-08-03)[2018-08-20]. https://arxiv.org/abs/1808.01204. |
8 | ZE Yuan, ZHU Allen, LI Yuanzhi, et al. On the convergence rate of training recurrent neural networks[J/OL]. arXiv: 1810.12065v4(2018-10-29)[2019-05-27]. https://arxiv.org/abs/1810.12065. |
9 | ZHANG Haijun , ZHANG Nan , XIAO Nanfeng . Fire detection and identification method based on visual attention mechanism[J]. Optik, 2015, 126 (6): 5011- 5018. |
10 | CHEN Minmin, JEFFREY Pennington, SAMUEL S S. Dynamical isometry and a mean field theory of RNNs: gating enables signal propagation in recurrent neural networks[EB/OL]. (2018-06-14)[2019-02-08]. http://proceedings.mlr.press/v80/chen18i.html. |
11 | ANDROS Tjandra, SAKRIANI Sakti, SATOSHI Nakamura. Tensor decomposition for compressing recurrent neural network[EB/OL]. (2018-02-28)[2018-05-08]. https://arxiv.org/abs/1802.10410. |
12 | CHEN Qufei, MARINA Sokolova. Word2Vec and Doc2Vec in unsupervised sentiment analysis of clinical discharge summaries[EB/OL]. (2018-05-01)[2018-05-01]. https://arxiv.org/abs/1805.00352. |
13 | DL4J.Word2Vec, Doc2vec & GloVe: Neural word embeddings for natural language processing[EB/OL]. (2018-03-01)[2018-06-05]. https://deeplearning4j.org/docs/latest/deeplearning4j-nlp-word2vec. |
14 | RINA Panigrahy, SUSHANT Sachdeva, ZHANG Qiuyi. Convergence results for neural networks via electrodynamics[J/OL]. arXiv: 1702.00458v5(2017-02-01)[2018-12-04]. https://arxiv.org/abs/1702.00458. |
15 | BORDERS Florian, BERTHIER Tess, JORIO L D, et al. Iteratively unveiling new regions of interest in deep learning models[EB/OL]. (2018-04-11)[2018-06-11]. https://openreview.net/forum?id=rJz89iiiM. |
16 | KINDERMANS P J, KRISTOF T S, MAXIMILIAN Alber, et al. Learning how to explain neural networks: patternnet and pattern attribution[EB/OL]. (2017-05-16)[2017-10-24]. https://arxiv.org/abs/1705.05598. |
17 | CHOO Jaegul , LIU Shixia . Visual analytics for explainable deep learning[J]. Computer Graphics and Applications IEEE, 2018, 38 (4): 84- 92. |
18 | SMILKOV Daniel, THORAT Nikhil, KIM Been, et al. Smoothgrad: removing noise by adding noise[J/OL]. arXiv: 1706.03825v1(2017-06-12)[2017-06-12]. https://arxiv.org/abs/1706.03825. |
19 |
CHEN H , CHIANG R H L , STOREY V C . Business intelligence and analytics: From big data to big impact[J]. MIS Quarterly, 2012, 36 (4): 1165- 1188.
doi: 10.2307/41703503 |
20 |
KWON O , LEE N , SHIN B . Data quality management, data usage experience and acquisition intention of big data analytics[J]. International Journal of Information Management, 2014, 34 (3): 387- 394.
doi: 10.1016/j.ijinfomgt.2014.02.002 |
21 | TAF F, BIG D C. Demystifying big data: a practical guide to transforming the business of government[EB/OL]. (2012-10-01)[2012-10-05]. http://www.techamerica.org/Docs/fileManager.cfm?f=techamerica-bigdatareport-final.pdf. |
22 | TRIGUERO Isaac , PERALTA Daniel , BACARDIT Jaume , et al. MRPR: a MapReduce solution for prototype reduction in big data classification[J]. Neurocomputing, 2015, 150 (1): 331- 345. |
[1] | 王婷婷,翟俊海,张明阳,郝璞. 基于HBase和SimHash的大数据K-近邻算法[J]. 山东大学学报(工学版), 2018, 48(3): 54-59. |
[2] | 谢志峰,吴佳萍,马利庄. 基于卷积神经网络的中文财经新闻分类方法[J]. 山东大学学报(工学版), 2018, 48(3): 34-39. |
[3] | 刘洋,刘博,王峰. 基于Parameter Server框架的大数据挖掘优化算法[J]. 山东大学学报(工学版), 2017, 47(4): 1-6. |
[4] | 魏波,张文生,李元香,夏学文,吕敬钦. 一种选择特征的稀疏在线学习算法[J]. 山东大学学报(工学版), 2017, 47(1): 22-27. |
[5] | 董乃鹏 赵合计 SCHOMMER Christoph. 作者写作特征提取引擎[J]. 山东大学学报(工学版), 2009, 39(5): 27-31. |
|