您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2022, Vol. 52 ›› Issue (2): 23-30.doi: 10.6040/j.issn.1672-3961.0.2021.287

• • 上一篇    下一篇

基于时间感知注意力机制的混合编码网络方法

宁春梅,孙博,肖敬先,陈廷伟   

  1. 辽宁大学信息学院, 辽宁 沈阳 110036
  • 发布日期:2022-04-20
  • 作者简介:宁春梅(1995— ),女,河南商丘人,硕士研究生,主要研究方向为自然语言处理. E-mail:chunmeiningl@gmail.com

The method of hybrid code networks based on time-aware attention mechanism

NING Chunmei, SUN Bo, XIAO Jingxian, CHEN Tingwei   

  1. College of Information, Liaoning University, Shenyang 110036, Liaoning, China
  • Published:2022-04-20

摘要: 传统的混合编码网络在小样本数据训练情况下,捕捉用户意图与语义分析方面存在局限性,很难应用到新领域进行迁移训练。时间感知注意混合编码网络(time-aware attention hybrid code networks,TAA-HCN)通过构建时间感知的注意力机制和用户意图集成(user intent integration,UII)的门控机制建模用户意图与动作措施的关系,捕捉用户意图随时间动态变化,结合元学习的思想进行模型梯度自适应,以便模型快速收敛。TAA-HCN模型在WOZ数据集与BABI数据集上进行试验与分析,当目标域数据为总数据的5%时,F1与BLEU指标几乎全收敛,且准确率为69.3%,这表明了本研究的模型具有仅需很少的目标数据即可实现良好性能的能力。

关键词: 特定领域对话系统, 元学习, 用户意图时间感知注意机制, 混合编码网络, 时间感知递归单元

中图分类号: 

  • TP181
[1] SCHUSTERM, PALIWAL K K. Bidirectional recurrent neural networks[J]. IEEE Transactions on Signal Processing, 1997, 45(11):2673-2681.
[2] WILLIAMS J D, ASADI K, ZWEIG G. Hybrid code networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning[C] //Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(Volume 1: Long Papers). Vancouver, Canada: Association for Computational Linguistics, 2017: 665-677.
[3] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[J]. Computer Science, 2014: 1-15.
[4] FENG Y, LV F, SHEN W, et al. Deep session interest network for click-through rate prediction[C] //Proceedings of the Twenty-Eighth International Joint Conference onArtificial Intelligence. Macao, China: IJCAI, 2019: 2301-2307.
[5] LI J, REN P, CHEN Z, et al. Neural attentive session-based recommendation[C] //Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. New York, America: Association for Computing Machinery, 2017: 1419-1428.
[6] LIU Q, ZENG Y, MOKHOSI R, et al. STAMP: short-term attention/memory priority model for session-based recommendation[C] //Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York, America: Ass-ociation for Computing Machinery, 2018: 1831-1839.
[7] YU L, ZHANG C, LIANG S, et al. Multi-order attentive ranking model for sequential recommendation[J]. Proceedings of the AAAI Conference on Artificial Intelli-gence, 2019, 33:5709-5716.
[8] ZHOU G, MOU N, FAN Y, et al. Deep interest evolution network for click-through rate prediction[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33:5941-5948.
[9] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[10] WILLIAMS A, NANGIA N, BOWMAN S R. A broad-coverage challenge corpus for sentence understanding through inference[C] // Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1(Long Papers). New Orleans, America: Association for Computational Linguistics, 2018:1112-1122.
[11] DOLAN W B, BROCKETT C. Automatically cons-tructing a corpus of sentential paraphrases[C] //Proceedings of the Third International Workshop on Paraphrasing(IWP2005). Jeju Island, Korea: Asian Federation of Natural Language Processing, 2005: 1-8.
[12] SUBAKANC, RAVANELLI M, CORNELL S, et al. Attention is all you need in speech separation[C] // ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Toronto, Canada: IEEE, 2021: 21-25.
[13] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C] //Proceedings of the 2019 Conference of the North American Chapter ofthe Association for Computational Linguistics: Human Language Technologies. Minneapolis,America:Association for Compu-tational Linguistics, 2019: 4171-4186.
[14] FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C] //Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: PMLR, 2017: 1126-1135.
[15] SHI W, YU Z. Sentiment adaptive end-to-end dialog systems[C] //Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne, Australia: ACL, 2018: 1509-1519.
[16] MO K, ZHANG Y, LI S, et al. Personalizing a dialogue system with transfer reinforcement learning[C] //Proceedings of the AAAI Conference on Artificial Intelligence. New Orleans, America:AAAI, 2018:32(1):5317-5324.
[17] GENEVAY A, LAROCHE R. Transfer learning for user adaptation in spoken dialogue systems[C] //Proceedings of the 2016 International Conference on Autonomous Agents. Singapore: ACM, 2016: 975-983.
[18] TRAN V K, NGUYEN L M. Adversarial domain adaptation for variational neural language generation in dialogue systems[C] //Proceedings of the 27th International Conference on Computational Linguistics. Santafe, America: Association for Computational Linguistics, 2018:1205-1217.
[19] MRKSIC N, SEAGHDHA D, THOMSON B, et al. Multi-domain dialog state tracking using recurrent neural networks[C] // Proceedings of the 53rd Annual Meeting of the Association for Computational. Beijing, China: The Association for Computer Linguistics, 2015:794-799.
[20] VLASOVV, DRISSNER-SCHMID A, NICHOL A. Few-shot generalization across dialogue tasks[J]. Computing Research Repository, 2018:1-10.
[21] ZHAO T, ESKENAZI M. Zero-shot dialog generation with cross-domain latent actions[C] // Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue. Melbourne, Australia: Association for Computational Linguistics, 2018:1-10.
[22] BUDZIANOWSKI P, WEN T H, TSENG B H, et al. MultiWOZ:alarge-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling[C] // Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics, 2018: 5016-5026.
[23] BORDES A, BOUREAU Y L, WESTON J. Learning end-to-end goal-oriented dialog[J]. Computing Research Repository, 2016: 609-618.
[1] 刘玉芳,王绍卿,郑顺,张丽杰,孙福振. 基于跨域元学习框架的冷启动用户表示学习方法[J]. 山东大学学报 (工学版), 2024, 54(6): 29-37.
[2] 李璐,张志军,范钰敏,王星,袁卫华. 面向冷启动用户的元学习与图转移学习序列推荐[J]. 山东大学学报 (工学版), 2024, 54(2): 69-79.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 李 侃 . 嵌入式相贯线焊接控制系统开发与实现[J]. 山东大学学报(工学版), 2008, 38(4): 37 -41 .
[2] 来翔 . 用胞映射方法讨论一类MKdV方程[J]. 山东大学学报(工学版), 2006, 36(1): 87 -92 .
[3] 余嘉元1 , 田金亭1 , 朱强忠2 . 计算智能在心理学中的应用[J]. 山东大学学报(工学版), 2009, 39(1): 1 -5 .
[4] 李梁,罗奇鸣,陈恩红. 对象级搜索中基于图的对象排序模型(英文)[J]. 山东大学学报(工学版), 2009, 39(1): 15 -21 .
[5] 陈瑞,李红伟,田靖. 磁极数对径向磁轴承承载力的影响[J]. 山东大学学报(工学版), 2018, 48(2): 81 -85 .
[6] 王波,王宁生 . 机电装配体拆卸序列的自动生成及组合优化[J]. 山东大学学报(工学版), 2006, 36(2): 52 -57 .
[7] 张英,郎咏梅,赵玉晓,张鉴达,乔鹏,李善评 . 由EGSB厌氧颗粒污泥培养好氧颗粒污泥的工艺探讨[J]. 山东大学学报(工学版), 2006, 36(4): 56 -59 .
[8] Yue Khing Toh1 , XIAO Wendong2 , XIE Lihua1 . 基于无线传感器网络的分散目标跟踪:实际测试平台的开发应用(英文)[J]. 山东大学学报(工学版), 2009, 39(1): 50 -56 .
[9] 孙炜伟,王玉振. 考虑饱和的发电机单机无穷大系统有限增益镇定[J]. 山东大学学报(工学版), 2009, 39(1): 69 -76 .
[10] 孙玉利,李法德,左敦稳,戚美 . 直立分室式流体连续通电加热系统的升温特性[J]. 山东大学学报(工学版), 2006, 36(6): 19 -23 .