中文对话理解中基于预训练的意图分类和槽填充联合模型

doi:10.6040/j.issn.1672-3961.0.2020.236

摘要/Abstract

摘要： 基于预训练和注意机制的意图分类和语义槽填充,提出一种结合双向长短时记忆(bidirectional long short-term memory, BiLSTM)、条件随机场(conditional random fields, CRF)和注意机制的双向编码(bidirectional encoder representations from transformers, BERT)具有双向编码表示和注意机制的联合模型。该模型无需过多依赖手工标签数据和领域特定的知识或资源,避免了目前普遍存在的弱泛化能力。在自主公交信息查询系统语料库上进行的试验表明,该模型意图分类的准确性和语义槽填充F1值分别达到98%和96.3%,均产生有效改进。

关键词: 意图分类, 槽填充, 预训练, 双向编码表示, 多头注意

Abstract: We explored a joint model for intent classification and slot filling based on pre-train and attention mechanism because intent classification and slot filling were correlative. We combined bidirectional long short-term memory(BiLSTM), conditional random fields(CRF)and bidirectional encoder representations from transformers(BERT), which supported bidirectional and self-attentional mechanism without relying heavily on hand-crafted features and domain-specific knowledge or resources, into the proposed model. We compared the performance of our proposed architecture with the state-of-the-art models. Experiments on dataset demonstrated that the proposed architecture outperformed the state-of-the-art approaches on both tasks. Furthermore, we presented a new dialogue corpus from autonomous bus information inquiry system(ABIIS), and our methods yielded effective improvement on intent classification accuracy and slot filling F1 compared with a state-of-the-art baseline.

Key words: intent classification, slot filling, pre-train, bidirectional encoder representation, multi-head attention

中图分类号:

TP391

马常霞,张晨. 中文对话理解中基于预训练的意图分类和槽填充联合模型[J]. 山东大学学报 (工学版), 2020, 50(6): 68-75.

MA Changxia, ZHANG Chen. Pre-trained based joint model for intent classification and slot filling in Chinese spoken language understanding[J]. Journal of Shandong University(Engineering Science), 2020, 50(6): 68-75.

参考文献

[1] CHEN H, LIU X, YIN D, et al. A survey on dialogue systems[J]. ACM SIGKDD Explorations Newsletter, 2017, 19(2):25-35.
[2] MCCALLUM A, FREITAG D, PEREIRA F. Maximum entropy Markov models for information extraction and segmentation[C] //Proc of the Seventeenth International Conference on Machine Learning. Stanford, USA: ICML, 2000:591-598.
[3] MCCALLUM A, WEI L. Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons[C] // Proc of NAACL. Singapore: ACL, 2003:188-191.
[4] LAFFERTY J, MCCALLUM A, PEREIRA F. Conditional random fields: Probabilistic models for segmenting and labeling sequence data[C] // Proc of the Eighteenth International Conference on Machine Learning. San Francisco, USA: ICML, 2001:282-289.
[5] KOICHI T, NIGEL C. Use of support vector machines in extended named entity recognition[C] //Proc of the 6th Conference on Natural Language Learning. Stroudsburg, USA: ACM, 2002: 1-7.
[6] DAVID N, SATOSHI S. A survey of named entity recognition and classification[J]. International Journal of Linguistics and Language Resources, 2007, 30(1):3-26.
[7] LEV R, DAN R. Design challenges and misconceptions in named entity recognition[C] //Proc of CoNLL-2009. Madison, USA: ACL, 2009:147-155.
[8] HU Zhiting, MA Xuezhe, LIU Zhengzhong, et al. Harnessing deep neural networks with logic rules[C] //Proc of ACL-2016. Berlin, Germany: ACL, 2016:2410-2420.
[9] MA Xuezhe, EDUARD HOVY. End-to-end sequence labeling via Bi-directional LSTM-CNNs-CRF[C] // Proc of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, Germany: ACL, 2016:1064-1074.
[10] CHIU JASON, NICHOLS ERIC. Named entity recognition with bidirectional LSTM-CNNs[J]. Transactions of the Association for Computational Linguistics, 2016, 4(7):357-370.
[11] LIU Liyuan, SHANG Jingbo, XU FRANK, et al. Empower sequence labeling with task-aware neural language model[C] //Proc of Thirty-Second AAAI Conference on Artificial Intelligence. New Orleans, USA: AAAI, 2018:5253-5260.
[12] SURENDRAN D, LEVOW G A. Dialog act tagging with support vector machines and hidden Markov models[C] //Proc of INTERSPEECH. Pittsburgh, USA: ICSLP, 2006:1950-1953.
[13] ALI S A, SULAIMAN N, MUSTAPHA A N. Improving accuracy of intention-based response classification using decision tree[J]. Information Technology Journal, 2009, 8(6):923-928.
[14] NIIMI Y, OKU T, NISHIMOTO T, et al. A rule-based approach to extraction of topics and dialog acts in a spoken dialog system[C] //Proc of EUROSPEECH 2001 Scandinavia, European Conference on Speech Communication and Technology, INTERSPEECH Event. Aalborg, Denmark: ISCA, 2001:2185-2188.
[15] MIKOLOV T, MARTIN K, LUKAS B, et al. Recurrent neural network-based language model[C] //Proc of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH. Chiba, Japan: IEEE, 2010:1045-1048.
[16] SEPP H, JUEGEN S. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[17] FELIX G, JURGEN S, FRED C. Learning to forget: Continual prediction with LSTM[J]. Neural Computation, 2000, 12(10): 2451-2471.
[18] YAO Kaisheng, PENG Baolin, ZHANG Yu, et al. Spoken language understanding using long short-term memory neural networks[C] //Proc of Spoken Language Technology Workshop. California, USA: IEEE, 2014:189-194.
[19] SHI Yangyang, YAO Kaisheng, TIAN Le, et al. Deep lstm based feature mapping for query classification[C] //Proc of NAACL-HLT. California, USA: ACL, 2016:1501-1511.
[20] ZHANG Xiaodong, WANG Houfeng. A joint model of intent determination and slot filling for spoken language understanding[C] //Proc of 25th International Joint Conference on Artificial Intelligence. New York, USA: AAAL, 2016:2993-2999.
[21] MATTHEW P, MARK N, MOHIT I, et al. Deep contextualized word representations[C] // Proc of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Volume 1(Long Papers)In NAACL-HLT. Louisiana, USA: ACL, 2018: 2227-2237.
[22] MIKOLOV T, ILYA S, CHEN Kai, et al. Distributed representations of words and phrases and their compositionality[J]. Advances in Neural Information Processing Systems, 2013, 10: 3113-3119.
[23] RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training [R]. OpenAI, 2018.
[24] DEVLIN J, CHANG MW, LEE K, et al. Bert: Pretraining of deep bidirectional transformers for language understanding[C] //Proc of NAACL-HLT. Minneapolis, USA: ACL, 2019: 4171-4186.
[25] ASHISH V, NOAM S, NIKI P, et al. Attention is all you need[C] //Proc of 31st Conference on Neural Information Processing Systems. New York, USA: NIPS, 2017:6000-6010.
[26] CHRIS D, MIGUEL B, WANG Ling, et al. Transition-based dependency parsing with stack long short-term memory[C] //Proc of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Beijing, China: ACL, 2015: 334-343.

相关文章 15

[1]	谢晓兰,王琦. 一种基于多目标的容器云任务调度算法[J]. 山东大学学报 (工学版), 2020, 50(4): 14-21.
[2]	蔡国永,贺歆灏,储阳阳. 基于空间注意力和卷积神经网络的视觉情感分析[J]. 山东大学学报 (工学版), 2020, 50(4): 8-13.
[3]	成科扬,孙爽,詹永照. 基于背景复杂度自适应距离阈值的修正SuBSENSE算法[J]. 山东大学学报 (工学版), 2020, 50(3): 38-44.
[4]	田枫,李欣,刘芳,李闯,孙小强,杜睿山. 基于多模态子空间学习的语义标签生成方法[J]. 山东大学学报 (工学版), 2020, 50(3): 31-37, 44.
[5]	马金平. 基于UART串口的多机通讯[J]. 山东大学学报 (工学版), 2020, 50(3): 24-30.
[6]	袁高腾,刘毅慧,黄伟,胡兵. 基于Gabor特征的乳腺肿瘤MR图像分类识别模型[J]. 山东大学学报 (工学版), 2020, 50(3): 15-23.
[7]	段江丽,胡新. 自然语言问答中的语义关系识别[J]. 山东大学学报 (工学版), 2020, 50(3): 1-7.
[8]	刘保成,朴燕,宋雪梅. 联合检测的自适应融合目标跟踪[J]. 山东大学学报 (工学版), 2020, 50(3): 51-57.
[9]	严云洋,杜晨锡,刘以安,高尚兵. 基于轻型卷积神经网络的火焰检测方法[J]. 山东大学学报 (工学版), 2020, 50(2): 100-107.
[10]	张胜男,王雷,常春红,郝本利. 基于三维剪切波变换和BM4D的图像去噪方法[J]. 山东大学学报 (工学版), 2020, 50(2): 83-90.
[11]	胡龙茂,胡学钢. 基于多维相似度和情感词扩充的相同产品特征识别[J]. 山东大学学报 (工学版), 2020, 50(2): 50-59.
[12]	陈艳平,冯丽,秦永彬,黄瑞章. 一种基于深度神经网络的句法要素识别方法[J]. 山东大学学报 (工学版), 2020, 50(2): 44-49.
[13]	闫威,张达敏,张绘娟,辛梓芸,陈忠云. 基于混合决策的改进鸟群算法[J]. 山东大学学报 (工学版), 2020, 50(2): 34-43.
[14]	宋士奇,朴燕,蒋泽新. 基于改进YOLOv3的复杂场景车辆分类与跟踪[J]. 山东大学学报 (工学版), 2020, 50(2): 27-33.
[15]	陈宁宁,赵建伟,周正华. 基于校正神经网络的视频追踪算法[J]. 山东大学学报 (工学版), 2020, 50(2): 17-26.

多维度评价

Viewed

Full text

752

HTML			PDF

Just accepted	Online first	Issue	Just accepted	Online first	Issue
0	0	0	0	0	752

From	Others	local

Times	84	668
Rate	11%	89%

Abstract

1330

Just accepted	Online first	Issue

0	0	1330

From	Others	local

Times	1329	1
Rate	100%	0%

Cited

Web of Science	Crossref	ScienceDirect	Search for Citations in Google Scholar >>


This page requires you have already subscribed to WoS.

Shared

Discussed