基于域对抗网络和BERT的跨领域文本情感分析

doi:10.6040/j.issn.1672-3961.0.2019.293

摘要/Abstract

摘要：

跨领域文本情感分析时,为了使抽取的共享情感特征能够捕获更多的句子语义信息特征,提出域对抗和BERT(bidirectional encoder representations from transformers)的深度网络模型。利用BERT结构抽取句子语义表示向量,通过卷积神经网络抽取句子的局部特征。通过使用域对抗神经网络使得不同领域抽取的特征表示尽量不可判别,即源领域和目标领域抽取的特征具有更多的相似性;通过在有情感标签的源领域数据集上训练情感分类器,期望该分类器在源领域和目标领域均能达到较好的情感分类效果。在亚马逊产品评论数据集上的试验结果表明,该方法具有良好的性能,能够更好地实现跨领域文本情感分类。

关键词: 跨领域, 情感分析, 卷积神经网络, 域对抗网络, 共享情感特征

Abstract:

In order to capture more sentence semantic information from the extracted shared sentiment features for cross-domain sentiment analysis, a deep network model based on domain adversarial mechanism and BERT (bidirectional encoder representations from transformers) was proposed. The model firstly used BERT to obtain the semantic representation vectors of sentences, and then extracted the local features of sentences with a convolutional neural network. A domain adversarial neural network was designed to make the representations of features extracted from different domains to be as indistinguishable as possible, that was, the features extracted from source domain and target domain had much more similarities; and a sentiment classifier was trained on the source domain dataset with sentiment labels, and it was expected that the trained sentiment classifier would have good classification performance in the source domain, and in the target domain. The experimental results on Amazon product reviews dataset showed that the proposed method achieved the expectation and was competent for achieving cross-domain text sentiment classification.

Key words: cross-domain, sentiment analysis, convolution neural network, domain adversarial network, shared sentiment features

中图分类号:

TP391

蔡国永, 林强, 任凯琪. 基于域对抗网络和BERT的跨领域文本情感分析[J]. 山东大学学报 (工学版), 2020, 50(1): 1-7.

Guoyong CAI, Qiang LIN, Kaiqi REN. Cross-domain text sentiment classification based on domain-adversarialnetwork and BERT[J]. Journal of Shandong University(Engineering Science), 2020, 50(1): 1-7.

图/表 8

图1

图2

图3

图4

表1

表2

图5

图6

参考文献 18

1	TAN S, CHENG X, WANG Y, et al. Adapting naive bayes to domain adaptation for sentiment analysis[C]//European Conference on Information Retrieval. Berlin, Germany: Springer, 2009: 337-349.
2	PAN S J, NI X, SUN J, et al. Cross-domain sentiment classification via spectral feature alignment[C]//19^thInternational Conference on World Wide Web. Raleigh, North Carolina, USA: ACM, 2010: 751-760.
3	GLOROT X, BORDES A, BENGIO Y, et al. Domain adaptation for large-scale sentiment classification: a deep learning approach[C]//28^th International Conference on Machine Learning. Bellevue, Washington, USA: Omnipress, 2011: 513-520.
4	CHEN M, XU Z, SHA F, et al. Marginalized Denoising Autoencoders for Domain Adaptation[C]//29^th International Conference on Machine Learning. Edinburgh, Scotland, UK: [s.n.], 2012: 1627-1634.
5	GANIN Y , USTINOVA E , AJAKAN H , et al. Domain-adversarial training of neural networks[J]. Journal of Machine Learning Research, 2016, 17 (1): 1- 35.
6	AJAKAN H , GERMAIN P , LAROCHELLE H , et al. Domain-Adversarial Neural Networks[J]. Statistics, 2014, (1050): 1- 8.
7	LI Z, ZHANG Y, WEI Y, et al. End-to-end adversarial memory network for cross-domain sentiment classification[C]// 26^th International Joint Conference on Artificial Intelligence. Melbourne, Australia: [s.n.], 2017: 2237-2243.
8	LI Z, WEI Y, ZHANG Y, et al. Hierarchical attention transfer network for cross-domain sentiment classification[C]//32^th AAAI Conference on Artificial Intelligence. Hilton New Orleans Riverside, USA: AAAI, 2018: 5852-5859.
9	BLITZER J, DREDZE M, PEREORA F, et al. Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification[C]//Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. Prague, Czech Republic: Association for Computational Linguistics, 2007: 440-447.
10	GOODFELLOW I J, POUGETABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Advances in Neural Information Processing Systems. Montreal, Canada: [s.n.], 2014: 2672-2680.
11	KRIZHEVSKY A, SUTSKEVER I, HINTON G E, et al. ImageNet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems. Lake Tahoe, Nevada, USA: [s.n.], 2012: 1106-1114.
12	KARPATHY A, TODERICI G. Large-scale video classification with convolutional neural networks[C]//IEEE Conference on Computer Vision and Pattern Recognition. Columbus, Ohio, USA: IEEE Xplore, 2014: 1725-1732.
13	KIM Y. Convolutional neural networks for sentence classification[C]//Conference on empirical methods in natural language processing. Doha, Qatar: [S.n.], 2014: 1746-1751.
14	WEI X , LIN H , YU Y , et al. Low-resource cross-domain product review sentiment classification based on a CNN with an auxiliary large-scale corpus[J]. Algorithms, 2017, 10 (81): 1- 15.
15	WU F, HUANG Y. Sentiment domain adaptation with multiple sources[C]//Proceedings of the Annual Meeting of the Association for Computational Linguistics. Berlin, Germany: Association for Computational Linguistics, 2016: 301-310.
16	DEVLIN J, CHANG M, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[J]. arXiv: Computation and Language, 2018, 23(2): 3-19.
17	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is All you Need[C]//Advances in neural information processing systems. Long Beach, USA: [s.n.], 2017: 5998-6008.
18	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE Xplore, 2016: 770-778.

相关文章 15

[1]	范海雯,郝旭东,赵康,邢法财,蒋哲,李常刚. 基于卷积神经网络的含分布式光伏配电网静态等值[J]. 山东大学学报 (工学版), 2023, 53(4): 140-148.
[2]	王智伟,徐海超,郭相阳,马炯,褚云龙,陈前昌,卢治. 基于卷积神经网络和层次分析的新能源电源调频能力智能预测方法[J]. 山东大学学报 (工学版), 2022, 52(5): 70-76.
[3]	张学思,张婷,刘兆英,江天鹏. 基于轻量型卷积神经网络的海面红外显著性目标检测方法[J]. 山东大学学报 (工学版), 2022, 52(2): 41-49.
[4]	王心哲,邓棋文,王际潮,范剑超. 深度语义分割MRF模型的海洋筏式养殖信息提取[J]. 山东大学学报 (工学版), 2022, 52(2): 89-98.
[5]	尹旭,刘兆英,张婷,李玉鑑. 基于弱监督和半监督学习的红外舰船分割方法[J]. 山东大学学报 (工学版), 2022, 52(2): 99-106.
[6]	陶亮,刘宝宁,梁玮. 基于CNN-LSTM 混合模型的心律失常自动检测[J]. 山东大学学报 (工学版), 2021, 51(3): 30-36.
[7]	杨修远,彭韬,杨亮,林鸿飞. 基于知识蒸馏的自适应多领域情感分析[J]. 山东大学学报 (工学版), 2021, 51(3): 15-21.
[8]	廖锦萍,莫毓昌,YAN Ke. 基于C-LSTM的短期用电预测模型和应用[J]. 山东大学学报 (工学版), 2021, 51(2): 90-97.
[9]	蔡国永,贺歆灏,储阳阳. 基于空间注意力和卷积神经网络的视觉情感分析[J]. 山东大学学报 (工学版), 2020, 50(4): 8-13.
[10]	廖南星,周世斌,张国鹏,程德强. 基于类激活映射-注意力机制的图像描述方法[J]. 山东大学学报 (工学版), 2020, 50(4): 28-34.
[11]	宋士奇,朴燕,蒋泽新. 基于改进YOLOv3的复杂场景车辆分类与跟踪[J]. 山东大学学报 (工学版), 2020, 50(2): 27-33.
[12]	李春阳,李楠,冯涛,王朱贺,马靖凯. 基于深度学习的洗衣机异常音检测[J]. 山东大学学报 (工学版), 2020, 50(2): 108-117.
[13]	侯霄雄,许新征,朱炯,郭燕燕. 基于AlexNet和集成分类器的乳腺癌计算机辅助诊断方法[J]. 山东大学学报 (工学版), 2019, 49(2): 74-79.
[14]	权稳稳,林明星. CNN特征与BOF相融合的水下目标识别算法[J]. 山东大学学报 (工学版), 2019, 49(1): 107-113.
[15]	钱春琳,张兴芳,孙丽华. 基于在线评论情感分析的改进协同过滤推荐模型[J]. 山东大学学报 (工学版), 2019, 49(1): 47-54.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed

领域	积极评论	消极评论	无标签评论
Book	3 000	3 000	9 750
dvd	3 000	3 000	11 843
electronics	3 000	3 000	17 009
Kitchen	3 000	3 000	13 856
Video	3 000	3 000	30 180

源领域	目标领域	方法
源领域	目标领域	S-only	DANN	mSDA	AMN	HATN	BD	BCN
Books	DVD	0.805 7	0.834 2	0.861 2	0.856 2	0.870 7	0.893 8	0.893 3
	Kitchen	0.716 3	0.779 0	0.810 5	0.818 8	0.870 3	0.917 0	0.919 0
	Electronics	0.736 5	0.762 7	0.790 2	0.805 5	0.857 5	0.913 5	0.914 0
	Video	0.814 5	0.832 3	0.849 8	0.872 5	0.878 0	0.896 3	0.901 7
DVD	Books	0.764 5	0.807 7	0.851 7	0.845 3	0.877 8	0.905 0	0.910 5
	Kitchen	0.734 3	0.781 5	0.826 0	0.816 7	0.874 7	0.915 5	0.922 3
	Electronics	0.731 2	0.763 5	0.761 7	0.804 2	0.863 2	0.915 0	0.914 7
	Video	0.827 5	0.859 5	0.838 0	0.874 0	0.891 2	0.910 3	0.917 1
Video	DVD	0.824 3	0.841 5	0.859 0	0.868 8	0.879 0	0.905 0	0.909 1
	Kitchen	0.713 3	0.752 2	0.795 2	0.809 0	0.864 5	0.915 3	0.917 1
	Electronics	0.718 7	0.757 2	0.776 7	0.796 8	0.859 8	0.8985	0.900 0
	Books	0.770 3	0.800 3	0.830 0	0.835 0	0.871 0	0.909 1	0.907 6
Electronics	DVD	0.726 0	0.762 7	0.826 3	0.805 3	0.843 2	0.866 0	0.868 3
	Kitchen	0.846 3	0.845 3	0.858 0	0.878 3	0.900 8	0.939 3	0.941 7
	Video	0.724 8	0.772 0	0.817 0	0.821 2	0.841 8	0.864 5	0.870 0
	Books	0.688 7	0.735 3	0.799 2	0.775 2	0.840 3	0.887 6	0.882 1
Kitchen	DVD	0.733 2	0.753 2	0.821 8	0.795 0	0.847 2	0.867 2	0.870 8
	Video	0.760 8	0.763 7	0.814 7	0.821 5	0.848 5	0.872 8	0.874 5
	Electronics	0.831 5	0.855 3	0.880 0	0.866 8	0.893 3	0.933 8	0.930 8
	Books	0.715 3	0.741 7	0.805 5	0.790 5	0.848 8	0.888 8	0.882 3
平均准确率		0.759 2	0.790 0	0.823 6	0.827 9	0.866 1	0.900 7	0.902 3