山东大学学报(工学版) ›› 2018, Vol. 48 ›› Issue (3): 34-39.doi: 10.6040/j.issn.1672-3961.0.2017.433
谢志峰1,2,吴佳萍1,马利庄2,3
XIE Zhifeng1,2, WU Jiaping1, MA Lizhuang2,3
摘要: 针对目前财经领域内新闻数据杂乱无章、缺乏自动高效管理等问题,提出一种基于卷积神经网络的中文财经新闻分类方法。收集大规模财经新闻语料,通过无监督学习方法训练获得一个广义通用的财经类词向量模型,将词向量引入到卷积神经网络模型训练中实现有效分类。与传统方法相比,基于卷积神经网络的中文财经新闻分类方法网络模型结构简单,针对小样本集也能表现优异的性能,不仅能有效解决中文财经新闻分类问题,还可充分证明卷积神经网络在处理文本分类问题中的有效性。
中图分类号:
[1] WANG S, MANNING C D. Baselines and bigrams: Simple, good sentiment and topic classification[C] //Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Jeju Island, Korea: ACL, 2012: 90-94. [2] KRIZHEVSKY A, SUTSKEVER I, HINTON G. Imagenet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012, 60(2): 1097-1105. [3] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C] //Proceedings of Conference on Computer Vision and Pattern Recognition(CVPR). Boston, USA: IEEE, 2015: 1-9. [4] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL].[2017-06-20]. https://arxiv.org/pdf/1409.1556v6.pdf. [5] 徐姗姗, 刘应安, 徐昇. 基于卷积神经网络的木材缺陷识别[J]. 山东大学学报(工学版), 2013, 43(2): 23-28. XU Shanshan, LIU Ying'an, XU Sheng. Wood defects recognition based on the convolutional neural network[J]. Journal of Shandong University(Engineering Science), 2013, 43(2): 23-28. [6] 奚雪峰, 周国栋. 面向自然语言处理的深度学习研究[J]. 自动化学报, 2016, 42(10): 1445-1565. XI Xuefeng, ZHOU Guodong. A survey on deep learning for natural language processin[J]. Acta Automatica Sinica, 2016, 42(10): 1445-1565. [7] BENGIO Y, DUCHARME R, VINCENT P, et al. A neural probabilistic language model[J]. The Journal of Machine Learning Research, 2006, 3(6):1137-1155. [8] FISCHER A, IGEL C. An introduction to restricted Boltzmann machines[C] //Iberoamerican Congress on Pattern Recognition. Berlin, Germany: Springer, 2012: 14-36. [9] MNIH A, HINTON G. A scalable hierarchical distributed language model[C] //International Conference on Neural Information Processing Systems. Vancouver, Canada: NIPS, 2008: 1081-1088. [10] SOCHER R, PENNINGTON J, HUANG E H, et al. Semi-supervised recursive autoencoders for predicting sentiment distributions[C] //Proceedings of the Conference on Empirical Methods in Natural Language Processing. Edinburgh, Britain: ACL, 2011: 151-161. [11] COLLOBERT R, WESTON J,BOTTOU L, et al. Natural language processing(almost)from scratch[J]. The Journal of Machine Learning Rearch, 2011, 12(1): 2493-2537. [12] MIOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C] //Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, USA: NIPS, 2013: 3111-3119. [13] ZHOU Shusen, CHEN Qingcai, WANG Xiaolong. Convolutional active deep learning method for semi-supervised sentiment classification[J]. Necuro computing, 2013, 120(10): 536-546. [14] JOHNSON R, ZHANG T. Effective use of word order for text categorization with convolutional neural networks[J]. Eprint Arxiv, 2014:1412. [15] BLUNSOM P, GREFENSTEEN E, KALCHBRENNER N. et al. Aconovolutional neural network for modelling sentences[C] //Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, USA: ACL, 2014: 655-665. [16] KIM Y. Convolutional neural networks for sentence classification[C] //Proceedings of the EMNLP. Doha, Qatar: Association for Computational Linguistics, 2014: 1746-1751. [17] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL]. [2017-06-20]. https://arxiv.org/pdf/1301.3781v3.pdf [18] HINTON G E, SRIVASTAVA N, KRIZHEVSKY A, et al. Improving neural networks by preventing co-adaptation of feature detectors[EB/OL]. [2017-06-20]. https://arxiv.org/pdf/1207.0580v1.pdf. [19] 唐明,朱磊,邹显春. 基于Word2vec的一种文档向量表示[J]. 计算机科学,2016, 43(6): 264-269. TANG Ming, ZHU Lei, ZOU Xianchun, et al. Documenl vector representation based on Word2vec[J]. Computer Science, 2016, 43(6): 264-269. [20] 陈钊,徐睿峰,桂林,等. 结合卷积神经网络和词语情感序列特征的中文情感分析[J]. 中文信息学报, 2015, 29(6): 172-178. CHEN Zhao, XU Ruifeng, GUI Lin, et al. Combining convolutional neural networks and word sentiment sequence features for Chinese text sentiment analysis[J]. Processing of Journal of chinese Imformation, 2015, 29(6): 172-178. [21] ZEILER M. ADADELTA: An adaptive learning rate method[EB/OL]. [2017-06-20]. https://arxiv.org/pdf/1212.5701v1.pdf. [22] BOTTOU L. Large-scale machine learning with stochastic gradient descent[C] //Proceedings of COMPATAT’2010. Berlin, Germany: Springer, 2010: 177-186. [23] SOCHER B, HUVAL C, MANNING A. Semantic compositionality through recursive matrix-vector spaces[C] //Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Jeju Island, Korea: ACL, 2012: 1201-1211. [24] DONG L, LIU S J, ZHOU M, et al. A statistical parsing framework for sentiment classification[J]. Computational Linguistic, 2015, 41(2): 293-336. [25] NAKAGAWA K, INUI S, KUROHASHI S. Dependency tree-based sentiment classification using CRFs with hidden variables[C] //The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Los Angeles, USA: ACL, 2010: 786-794. |
[1] | 张璞,刘畅,王永. 基于特征融合和集成学习的建议语句分类模型[J]. 山东大学学报(工学版), 2018, 48(5): 47-54. |
[2] | 梁蒙蒙,周涛,夏勇,张飞飞,杨健. 基于PSO-ConvK卷积神经网络的肺部肿瘤图像识别[J]. 山东大学学报(工学版), 2018, 48(5): 77-84. |
[3] | 赵彦霞, 王熙照. 基于SVD和DCNN的彩色图像多功能零水印算法[J]. 山东大学学报(工学版), 2018, 48(3): 25-33. |
[4] | 何正义,曾宪华,郭姜. 一种集成卷积神经网络和深信网的步态识别与模拟方法[J]. 山东大学学报(工学版), 2018, 48(3): 88-95. |
[5] | 唐乐爽,田国会,黄彬. 一种基于DSmT推理的物品融合识别算法[J]. 山东大学学报(工学版), 2018, 48(1): 50-56. |
[6] | 周福娜,高育林,王佳瑜,文成林. 基于深度学习的缓变故障早期诊断及寿命预测[J]. 山东大学学报(工学版), 2017, 47(5): 30-37. |
[7] | 何正义,曾宪华,曲省卫,吴治龙. 基于集成深度学习的时间序列预测模型[J]. 山东大学学报(工学版), 2016, 46(6): 40-47. |
[8] | 郑毅, 朱成璋. 基于深度信念网络的PM2.5预测[J]. 山东大学学报(工学版), 2014, 44(6): 19-25. |
[9] | 徐姗姗,刘应安*,徐昇. 基于卷积神经网络的木材缺陷识别[J]. 山东大学学报(工学版), 2013, 43(2): 23-28. |
[10] | 张永军1,刘金岭2,于长辉3. 基于词贡献度的垃圾短信分类方法[J]. 山东大学学报(工学版), 2012, 42(5): 87-90. |
[11] | 王洪元,封磊,冯燕,程起才. 流形学习算法在中文文本分类中的应用[J]. 山东大学学报(工学版), 2012, 42(4): 8-12. |
[12] | 王法波,许信顺. 文本分类中一种新的特征选择方法[J]. 山东大学学报(工学版), 2010, 40(4): 8-11. |
[13] | 董乃鹏 赵合计 SCHOMMER Christoph. 作者写作特征提取引擎[J]. 山东大学学报(工学版), 2009, 39(5): 27-31. |
|