Journal of Shandong University(Engineering Science) ›› 2019, Vol. 49 ›› Issue (1): 41-46.doi: 10.6040/j.issn.1672-3961.0.2018.341

• Machine Learning & Data Mining • Previous Articles     Next Articles

Features analysis for Chinese irony detection

Rongxiang ZHOU(),Xiuyi JIA*()   

  1. School of Computer Science and Engineering, Nanjing University of Science & Technology, Nanjing 210094, Jiangsu, China
  • Received:2018-08-13 Online:2019-02-20 Published:2019-03-01
  • Contact: Xiuyi JIA E-mail:zhourongxiang1@163.com;jiaxy@njust.edu.cn
  • Supported by:
    国家自然科学基金(61773208);国家自然科学基金(71671086);国家自然科学基金(61403200)

Abstract:

The research object was data in microblog. The features of irony detection were studied. In view of the characteristics of microblog and irony detection, a variety of features were constructed, such as emotional phrases, emoticons and so on. The experiments showed that the proposed irony features improved 0.34% on recognition accuracy, 0.74% on recall and 0.18% on F-measure, compared with the existing ones for the imbalanced datasets. The proposed irony features also improved 0.44% on recognition accuracy, 2.54% on recall and 0.14% on F-measure, compared with the existing ones for the balanced datasets.

Key words: irony detection, sentiment analysis, features construction, imbalanced dataset, balanced dataset

CLC Number: 

  • TP391

Table 1

Categories ofdegree adverbs"

类别 程度副词
极量 分外,十分,备加,万分,倍加,异常,尤,尤为,尤其,最,最为,无比,极,极为,极其,极度,极端,格外,殊,深为,特,特别,甚,甚为,绝,绝伦,绝对,绝顶,至,至为,透,透顶,顶,顶顶,非常
高量 大,大为,大大,太,好,好不,好生,很,忒,怪,挺,愈,愈为,愈加,愈发,愈益,更,更为,更其,更加,多,多么,比较,益,益发,相当,较,较为,较比,越,越加,越发,过,过于,过分,颇,颇为,蛮,老
较中量 何其,何等,够,尽,全然,满,还,真
中量 几,几乎,差不多,差点儿
较低量 不大,不太,不很,不甚,不胜
低量 多少,微微,丝毫,些微,有些,有点,有点儿,略,略为,略微,略略,毫,稍,稍为,稍微,稍稍,稍许

Table 2

Models ofemotional phrases"

类型 例子
n+q 凯歌阵阵
q+n 阵阵凯歌
a+(uj)+n 美丽(的)花,最好(的)恭维
b+n 中等身材
n+u 暴风雨般,绅士似的
ad+(uj)+vn 迅速发展,认真(的)咨询
eng+n AA制
d+n 真是棒极了,真是调皮
d+v 互相支援,没法接受,强烈反对
a+(ul)+n 活跃气氛,学好功课,热爱家乡,拉仇恨,支持魅族,抛弃(了)魅族,没牛人
v+a 洗刷干净,喜欢清净
a+(ud)+v 关心(得)不够,油价上升,经济复苏
v+(ud)+d 高兴(得)很
v+r 鼓励他
ad+v 正确领导
v+(ud)+l 吃(得)很饱
v+m 有一套
a+(ud)+d 痛快极了,漂亮(的)很,好(得)很,标致极了
d+a 非常漂亮,比较合适,非常谦虚,比较空,有点特殊,过分良好
a+v 富裕起来
n+(v)+a 态度和蔼,情节较重,外形好,胆子(可)真大
v+(ul)+b 看(了)高兴
m+a 十分壮丽
m+f 100以上
n+f 天亮以后,尖子生里面

Table 3

Precision of three feature systems onunbalanced datasets"

分类器 朴素贝叶斯 逻辑斯蒂回归 支持向量机 决策树 随机森林
一元文法特征 0.445 0.404 0.690 0.642 0.610
文献[10]的反语特征 0.494 0.257 0.728 0.687 0.798
9个反语特征 0.505 0.244 0.729 0.680 0.823

Table 4

Recall of three feature systems on unbalanced datasets"

分类器 朴素贝叶斯 逻辑斯蒂回归 支持向量机 决策树 随机森林
一元文法特征 0.509 0.467 0.393 0.233 0.300
文献[10]的反语特征 0.560 0.513 0.489 0.339 0.321
9个反语特征 0.586 0.521 0.502 0.339 0.311

Table 5

F-measure of three feature systems on unbalanced datasets"

分类器 朴素贝叶斯 逻辑斯蒂回归 支持向量机 决策树 随机森林
一元文法特征 0.472 0.434 0.513 0.348 0.431
文献[10]的反语特征 0.525 0.342 0.585 0.453 0.472
9个反语特征 0.543 0.332 0.595 0.452 0.464

Table 6

Precision of three feature systems on balanced datasets"

分类器 朴素贝叶斯 逻辑斯蒂回归 支持向量机 决策树 随机森林
一元文法特征 0.768 0.699 0.732 0.660 0.739
文献[10]的反语特征 0.803 0.710 0.758 0.691 0.785
9个反语特征 0.803 0.734 0.766 0.698 0.768

Table 7

Recall of three feature systems on balanced datasets"

分类器 朴素贝叶斯 逻辑斯蒂回归 支持向量机 决策树 随机森林
一元文法特征 0.780 0.775 0.873 0.785 0.802
文献[10]的反语特征 0.819 0.779 0.861 0.756 0.870
9个反语特征 0.827 0.808 0.881 0.811 0.885

Table 8

F-measure of three feature systems on balanced datasets"

分类器 朴素贝叶斯 逻辑斯蒂回归 支持向量机 决策树 随机森林
一元文法特征 0.773 0.734 0.795 0.714 0.768
文献[10]的反语特征 0.811 0.742 0.805 0.721 0.824
9个反语特征 0.815 0.768 0.819 0.750 0.821

Fig.1

Precision of three feature systems on differentsize of datasets"

Fig.2

Recall of Three Feature Systems on DifferentSize of Datasets"

Fig.3

F-measure of three feature systems on differentsize of datasets"

1 WANG M, CAO D, LI L, et al. Microblog sentiment analysis based on cross-media bag-of-words model[C]//International Conference on Internet Multimedia Computing and Service. Xiamen: ACM, 2014: 76.
2 JIANG F, LIU Y, LUAN H, et al. Microblog sentiment analysis with emoticon space model[C]//Chinese National Conference on Social Media Processing. Berlin, Germany: Springer, 2014: 76-87.
3 OUG, CHEN W, LI B, et al. Clusm: an unsupervised model for microblog sentiment analysis incorporating link information[C]// International Conference on Database Systems for Advanced Applications. Bali, Indonesia: Springer, 2014: 481-494.
4 KAROUI J, FARAH B, MORICEAU V, et al. Towards a contextual pragmatic model to detect irony in tweets[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Beijing: ACL, 2018: 644-650.
5 BRUNTSCH R , RUCH W . Studying irony detection beyond ironic criticism: let′s include ironic praise[J]. Frontiers in Psychology, 2017, 8, 606.
doi: 10.3389/fpsyg.2017.00606
6 TASLIOGLU H, KARAGOZ P. Irony detection on microposts with limited set of features[C]//Proceedings of the Symposium on Applied Computing. Marrakech, Morocco: ACM, 2017: 1076-1081.
7 SAVOV P, NIELEK R. Ridiculously expensive watches and surprisingly many reviewers: a study of irony[C]//International Conference on Web Intelligence. Omaha, USA: IEEE, 2017: 725-729.
8 刘正光. 反语理论综述[J]. 解放军外国语学院学报, 2002, 22 (4): 16- 18.
doi: 10.3969/j.issn.1002-722X.2002.04.004
LIU Zhengguang . A critique of irony theories[J]. Journal of PLA University of Foreign Language, 2002, 22 (4): 16- 18.
doi: 10.3969/j.issn.1002-722X.2002.04.004
9 TANG Y J, CHEN H. Chinese irony corpus construction and ironic structure analysis[C]//Proceedings of the 25th International Conference on Computational Lingustics. Dublin, Ireland: ACL, 2014: 1269-1278.
10 邓钊, 贾修一, 陈家骏. 面向微博的中文反语识别研究[J]. 计算机工程与科学, 2015, 37 (12): 2312- 2317.
doi: 10.3969/j.issn.1007-130X.2015.12.018
DENG Zhao , JIA Xiuyi , CHEN Jiajun . A survey on chinese ironic detection in microblog[J]. Computer Engineering and Science, 2015, 37 (12): 2312- 2317.
doi: 10.3969/j.issn.1007-130X.2015.12.018
11 WU C H, WU F Z, WU S X, et al. THU NGN at semeval-2018 task 3: tweet irony detection with densely connected lstm and multi-task learning[C]//Proceedings of the 12th International Workshop on Semantic Evaluation. New Orleans, USA: ACL, 2018: 51-56.
12 邢竹天, 徐扬. 面向网络文本的汉语反讽修辞识别方法研究[J]. 山西大学学报(自然科学版), 2015, 38 (3): 385- 391.
XIN Zhutian , XU Yang . A study of chinese sarcasm detection methods on internet texts[J]. Journal of Shanxi University(Natural Science Edition), 2015, 38 (3): 385- 391.
13 CHARALAMPAKIS B, SPATHIS D, KOUSLIS E, et al. Detecting irony ongreek political tweets: a text mining approach[C]// In Proceedings of the 16th International Conference on Engineering Applications of Neural Networks. New York, USA: ACM, 2015, 17: 1-5.
14 REYESA, ROSSO P. Mining subjective knowledge from customer reviews: a specific case of irony detection[C]// Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis. Portland, USA: ACL, 2011: 118-124.
15 BOUAZIZIM, OHTSUKI T. Sarcasm detection in twitter[C]// Global Communications Conference. San Diego, USA: IEEE, 2016: 1-6.
16 RAVIK , RAVI V . A novel automatic satire and irony detection using ensembled feature selection and data mining[J]. Knowledge-Based Systems, 2016, 120, 15- 33.
17 CARVALHOP. Clues for detecting irony in user-generated contents: oh…!! it′s "so easy"; -[C]// International CIKM Workshop on Topic-Sentiment Analysis for MASS Opinion. New York, USA: ACM, 2009: 53-56.
18 吕叔湘. 中国文法要略[M]. 北京: 商务印书馆, 1982.
[1] QIAN Chunlin, ZHANG Xingfang, SUN Lihua. Advanced collaborative filtering recommendation model based on sentiment analysis of online review [J]. Journal of Shandong University(Engineering Science), 2019, 49(1): 47-54.
[2] SHEN Ji, MA Zhiqiang, LI Tuya, ZHANG Li. A word extend LDA model for short text sentiment [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 120-126.
[3] ZHOU Zhe, SHANG Lin. A sentiment analysis method based on dynamic lexicon and three-way decision [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2015, 45(1): 19-23.
[4] ZHOU Yongmei1, YANG Aimin1, LIN Jianghao2. A method of building Chinese microblog sentiment lexicon [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2014, 44(3): 36-40.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] XIA Bin,ZHANG Lian-jun . Energy comparison-based TOA estimation algorithm for the DS-CDMA UWB system[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(1): 70 -73 .
[2] BO De-Yun, ZHANG Dao-Jiang. Adaptive spectral clustering algorithm[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(5): 22 -26 .
[3] LI Shijin, WANG Shengte, HUANG Leping. Change detection with remote sensing images based on forward-backward heterogenicity[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 1 -9 .
[4] ZHAO Ke-Jun, WANG Xin-Jun, LIU Xiang, CHOU Yi-Hong. Algorithms of continuous top-k join query over structured overlay networks[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(5): 32 -37 .
[5] DING Wan-Tao, LI Shu-Cai, ZHANG Qing-Song. Discussion on interface error regularity of inclined  stratum predicted by TSP[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(4): 57 -60 .
[6] WANG Bai-wei,CAO Sheng-le . A mult-objective assessment method of the effects of industrial waste-water management[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(3): 89 -92 .
[7] CHOU Wu-Sheng, WANG Shuo. Study on the adaptive algorithm of the force reflection robotic master under large stiffness of the environment[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(1): 1 -5 .
[8] ZHANG Hui,WANG Meng-xia, HAN Xue-shan. The advanced thermal rating of power system and its application[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(6): 25 -29 .
[9] LI Jie ,LIU Hong. A method of fractal artistic pattern generation based on a genetic algorithm[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(6): 33 -36 .
[10] YAN Chong-jing, LIAO Wen-he, GUO Yu, CHENG Xiao-sheng. The BOM modeling based on the polychromatic graph[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(6): 70 -75 .