基于时间感知注意力机制的混合编码网络方法

doi:10.6040/j.issn.1672-3961.0.2021.287

摘要/Abstract

摘要： 传统的混合编码网络在小样本数据训练情况下,捕捉用户意图与语义分析方面存在局限性,很难应用到新领域进行迁移训练。时间感知注意混合编码网络(time-aware attention hybrid code networks,TAA-HCN)通过构建时间感知的注意力机制和用户意图集成(user intent integration,UII)的门控机制建模用户意图与动作措施的关系,捕捉用户意图随时间动态变化,结合元学习的思想进行模型梯度自适应,以便模型快速收敛。TAA-HCN模型在WOZ数据集与BABI数据集上进行试验与分析,当目标域数据为总数据的5%时,F1与BLEU指标几乎全收敛,且准确率为69.3%,这表明了本研究的模型具有仅需很少的目标数据即可实现良好性能的能力。

关键词: 特定领域对话系统, 元学习, 用户意图时间感知注意机制, 混合编码网络, 时间感知递归单元

中图分类号:

TP181

宁春梅,孙博,肖敬先,陈廷伟. 基于时间感知注意力机制的混合编码网络方法[J]. 山东大学学报 (工学版), 2022, 52(2): 23-30.

NING Chunmei, SUN Bo, XIAO Jingxian, CHEN Tingwei. The method of hybrid code networks based on time-aware attention mechanism[J]. Journal of Shandong University(Engineering Science), 2022, 52(2): 23-30.

参考文献

[1] SCHUSTERM, PALIWAL K K. Bidirectional recurrent neural networks[J]. IEEE Transactions on Signal Processing, 1997, 45(11):2673-2681.
[2] WILLIAMS J D, ASADI K, ZWEIG G. Hybrid code networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning[C] //Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(Volume 1: Long Papers). Vancouver, Canada: Association for Computational Linguistics, 2017: 665-677.
[3] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[J]. Computer Science, 2014: 1-15.
[4] FENG Y, LV F, SHEN W, et al. Deep session interest network for click-through rate prediction[C] //Proceedings of the Twenty-Eighth International Joint Conference onArtificial Intelligence. Macao, China: IJCAI, 2019: 2301-2307.
[5] LI J, REN P, CHEN Z, et al. Neural attentive session-based recommendation[C] //Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. New York, America: Association for Computing Machinery, 2017: 1419-1428.
[6] LIU Q, ZENG Y, MOKHOSI R, et al. STAMP: short-term attention/memory priority model for session-based recommendation[C] //Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York, America: Ass-ociation for Computing Machinery, 2018: 1831-1839.
[7] YU L, ZHANG C, LIANG S, et al. Multi-order attentive ranking model for sequential recommendation[J]. Proceedings of the AAAI Conference on Artificial Intelli-gence, 2019, 33:5709-5716.
[8] ZHOU G, MOU N, FAN Y, et al. Deep interest evolution network for click-through rate prediction[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33:5941-5948.
[9] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[10] WILLIAMS A, NANGIA N, BOWMAN S R. A broad-coverage challenge corpus for sentence understanding through inference[C] // Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1(Long Papers). New Orleans, America: Association for Computational Linguistics, 2018:1112-1122.
[11] DOLAN W B, BROCKETT C. Automatically cons-tructing a corpus of sentential paraphrases[C] //Proceedings of the Third International Workshop on Paraphrasing(IWP2005). Jeju Island, Korea: Asian Federation of Natural Language Processing, 2005: 1-8.
[12] SUBAKANC, RAVANELLI M, CORNELL S, et al. Attention is all you need in speech separation[C] // ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Toronto, Canada: IEEE, 2021: 21-25.
[13] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C] //Proceedings of the 2019 Conference of the North American Chapter ofthe Association for Computational Linguistics: Human Language Technologies. Minneapolis,America:Association for Compu-tational Linguistics, 2019: 4171-4186.
[14] FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C] //Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: PMLR, 2017: 1126-1135.
[15] SHI W, YU Z. Sentiment adaptive end-to-end dialog systems[C] //Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne, Australia: ACL, 2018: 1509-1519.
[16] MO K, ZHANG Y, LI S, et al. Personalizing a dialogue system with transfer reinforcement learning[C] //Proceedings of the AAAI Conference on Artificial Intelligence. New Orleans, America:AAAI, 2018:32(1):5317-5324.
[17] GENEVAY A, LAROCHE R. Transfer learning for user adaptation in spoken dialogue systems[C] //Proceedings of the 2016 International Conference on Autonomous Agents. Singapore: ACM, 2016: 975-983.
[18] TRAN V K, NGUYEN L M. Adversarial domain adaptation for variational neural language generation in dialogue systems[C] //Proceedings of the 27th International Conference on Computational Linguistics. Santafe, America: Association for Computational Linguistics, 2018:1205-1217.
[19] MRKSIC N, SEAGHDHA D, THOMSON B, et al. Multi-domain dialog state tracking using recurrent neural networks[C] // Proceedings of the 53rd Annual Meeting of the Association for Computational. Beijing, China: The Association for Computer Linguistics, 2015:794-799.
[20] VLASOVV, DRISSNER-SCHMID A, NICHOL A. Few-shot generalization across dialogue tasks[J]. Computing Research Repository, 2018:1-10.
[21] ZHAO T, ESKENAZI M. Zero-shot dialog generation with cross-domain latent actions[C] // Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue. Melbourne, Australia: Association for Computational Linguistics, 2018:1-10.
[22] BUDZIANOWSKI P, WEN T H, TSENG B H, et al. MultiWOZ:alarge-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling[C] // Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics, 2018: 5016-5026.
[23] BORDES A, BOUREAU Y L, WESTON J. Learning end-to-end goal-oriented dialog[J]. Computing Research Repository, 2016: 609-618.

相关文章 15

[1]	彭岩,冯婷婷,王洁. 基于集成学习的O₃的质量浓度预测模型[J]. 山东大学学报 (工学版), 2020, 50(4): 1-7.
[2]	王一宾,李田力,程玉胜,钱坤. 基于核极限学习机自编码器的标记分布学习[J]. 山东大学学报 (工学版), 2020, 50(3): 58-65.
[3]	李春阳,李楠,冯涛,王朱贺,马靖凯. 基于深度学习的洗衣机异常音检测[J]. 山东大学学报 (工学版), 2020, 50(2): 108-117.
[4]	李英达,谢宗霞. 基于核相似性删减策略的支持向量回归算法[J]. 山东大学学报 (工学版), 2019, 49(3): 8-14.
[5]	庞阔,陈思琪,宋笑迎,邹丽. 基于粒计算的语言概念决策形式背景分析[J]. 山东大学学报 (工学版), 2018, 48(6): 74-81.
[6]	何正义,曾宪华,郭姜. 一种集成卷积神经网络和深信网的步态识别与模拟方法[J]. 山东大学学报(工学版), 2018, 48(3): 88-95.
[7]	王婷婷,翟俊海,张明阳,郝璞. 基于HBase和SimHash的大数据K-近邻算法[J]. 山东大学学报(工学版), 2018, 48(3): 54-59.
[8]	崔晓松,王颖,孟佳, 邹丽. 基于语言值相似度推理的网络商家自评价方法[J]. 山东大学学报(工学版), 2018, 48(1): 1-7.
[9]	姚宇,冯健,张化光,韩克镇. 一种基于椭球体支持向量描述的异常检测方法[J]. 山东大学学报(工学版), 2017, 47(5): 195-202.
[10]	李素姝,王士同,李滔. 基于LS-SVM与模糊补准则的特征选择方法[J]. 山东大学学报(工学版), 2017, 47(3): 34-42.
[11]	刘英霞,王希常,唐晓丽,常发亮. 基于小波域特征和贝叶斯估计的目标检测算法[J]. 山东大学学报(工学版), 2017, 47(2): 63-70.
[12]	王梅,曾昭虎,孙莺萁,杨二龙,宋考平. 基于输入K-近邻的正则化路径上SVR贝叶斯组合[J]. 山东大学学报(工学版), 2016, 46(6): 8-14.
[13]	陈泽华,尚晓慧,柴晶. 基于混合Hausdorff距离的多示例学习近邻分类器[J]. 山东大学学报(工学版), 2016, 46(6): 15-22.
[14]	何正义,曾宪华,曲省卫,吴治龙. 基于集成深度学习的时间序列预测模型[J]. 山东大学学报(工学版), 2016, 46(6): 40-47.
[15]	王志强,文益民,李芳. 基于多方面评分的景点协同推荐算法[J]. 山东大学学报(工学版), 2016, 46(6): 54-61.

多维度评价

Viewed

Full text

313

HTML			PDF

Just accepted	Online first	Issue	Just accepted	Online first	Issue
0	0	0	0	0	313

From	Others	local

Times	53	260
Rate	17%	83%

Abstract

1052

Just accepted	Online first	Issue

0	0	1052

From	Others	local

Times	1050	2
Rate	100%	0%

Cited

Web of Science	Crossref	ScienceDirect	Search for Citations in Google Scholar >>


This page requires you have already subscribed to WoS.

Shared

Discussed