基于时间感知注意力机制的混合编码网络方法

doi:10.6040/j.issn.1672-3961.0.2021.287

摘要/Abstract

摘要： 传统的混合编码网络在小样本数据训练情况下,捕捉用户意图与语义分析方面存在局限性,很难应用到新领域进行迁移训练。时间感知注意混合编码网络(time-aware attention hybrid code networks,TAA-HCN)通过构建时间感知的注意力机制和用户意图集成(user intent integration,UII)的门控机制建模用户意图与动作措施的关系,捕捉用户意图随时间动态变化,结合元学习的思想进行模型梯度自适应,以便模型快速收敛。TAA-HCN模型在WOZ数据集与BABI数据集上进行试验与分析,当目标域数据为总数据的5%时,F1与BLEU指标几乎全收敛,且准确率为69.3%,这表明了本研究的模型具有仅需很少的目标数据即可实现良好性能的能力。

关键词: 特定领域对话系统, 元学习, 用户意图时间感知注意机制, 混合编码网络, 时间感知递归单元

中图分类号:

TP181

宁春梅,孙博,肖敬先,陈廷伟. 基于时间感知注意力机制的混合编码网络方法[J]. 山东大学学报 (工学版), 2022, 52(2): 23-30.

NING Chunmei, SUN Bo, XIAO Jingxian, CHEN Tingwei. The method of hybrid code networks based on time-aware attention mechanism[J]. Journal of Shandong University(Engineering Science), 2022, 52(2): 23-30.

参考文献

[1] SCHUSTERM, PALIWAL K K. Bidirectional recurrent neural networks[J]. IEEE Transactions on Signal Processing, 1997, 45(11):2673-2681.
[2] WILLIAMS J D, ASADI K, ZWEIG G. Hybrid code networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning[C] //Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(Volume 1: Long Papers). Vancouver, Canada: Association for Computational Linguistics, 2017: 665-677.
[3] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[J]. Computer Science, 2014: 1-15.
[4] FENG Y, LV F, SHEN W, et al. Deep session interest network for click-through rate prediction[C] //Proceedings of the Twenty-Eighth International Joint Conference onArtificial Intelligence. Macao, China: IJCAI, 2019: 2301-2307.
[5] LI J, REN P, CHEN Z, et al. Neural attentive session-based recommendation[C] //Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. New York, America: Association for Computing Machinery, 2017: 1419-1428.
[6] LIU Q, ZENG Y, MOKHOSI R, et al. STAMP: short-term attention/memory priority model for session-based recommendation[C] //Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York, America: Ass-ociation for Computing Machinery, 2018: 1831-1839.
[7] YU L, ZHANG C, LIANG S, et al. Multi-order attentive ranking model for sequential recommendation[J]. Proceedings of the AAAI Conference on Artificial Intelli-gence, 2019, 33:5709-5716.
[8] ZHOU G, MOU N, FAN Y, et al. Deep interest evolution network for click-through rate prediction[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33:5941-5948.
[9] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[10] WILLIAMS A, NANGIA N, BOWMAN S R. A broad-coverage challenge corpus for sentence understanding through inference[C] // Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1(Long Papers). New Orleans, America: Association for Computational Linguistics, 2018:1112-1122.
[11] DOLAN W B, BROCKETT C. Automatically cons-tructing a corpus of sentential paraphrases[C] //Proceedings of the Third International Workshop on Paraphrasing(IWP2005). Jeju Island, Korea: Asian Federation of Natural Language Processing, 2005: 1-8.
[12] SUBAKANC, RAVANELLI M, CORNELL S, et al. Attention is all you need in speech separation[C] // ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Toronto, Canada: IEEE, 2021: 21-25.
[13] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C] //Proceedings of the 2019 Conference of the North American Chapter ofthe Association for Computational Linguistics: Human Language Technologies. Minneapolis,America:Association for Compu-tational Linguistics, 2019: 4171-4186.
[14] FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C] //Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: PMLR, 2017: 1126-1135.
[15] SHI W, YU Z. Sentiment adaptive end-to-end dialog systems[C] //Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne, Australia: ACL, 2018: 1509-1519.
[16] MO K, ZHANG Y, LI S, et al. Personalizing a dialogue system with transfer reinforcement learning[C] //Proceedings of the AAAI Conference on Artificial Intelligence. New Orleans, America:AAAI, 2018:32(1):5317-5324.
[17] GENEVAY A, LAROCHE R. Transfer learning for user adaptation in spoken dialogue systems[C] //Proceedings of the 2016 International Conference on Autonomous Agents. Singapore: ACM, 2016: 975-983.
[18] TRAN V K, NGUYEN L M. Adversarial domain adaptation for variational neural language generation in dialogue systems[C] //Proceedings of the 27th International Conference on Computational Linguistics. Santafe, America: Association for Computational Linguistics, 2018:1205-1217.
[19] MRKSIC N, SEAGHDHA D, THOMSON B, et al. Multi-domain dialog state tracking using recurrent neural networks[C] // Proceedings of the 53rd Annual Meeting of the Association for Computational. Beijing, China: The Association for Computer Linguistics, 2015:794-799.
[20] VLASOVV, DRISSNER-SCHMID A, NICHOL A. Few-shot generalization across dialogue tasks[J]. Computing Research Repository, 2018:1-10.
[21] ZHAO T, ESKENAZI M. Zero-shot dialog generation with cross-domain latent actions[C] // Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue. Melbourne, Australia: Association for Computational Linguistics, 2018:1-10.
[22] BUDZIANOWSKI P, WEN T H, TSENG B H, et al. MultiWOZ:alarge-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling[C] // Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics, 2018: 5016-5026.
[23] BORDES A, BOUREAU Y L, WESTON J. Learning end-to-end goal-oriented dialog[J]. Computing Research Repository, 2016: 609-618.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed