基于分解式Transformer的联邦长期时间序列预测算法

摘要/Abstract

参考文献

多维度评价

doi:10.6040/j.issn.1672-3961.0.2023.271

摘要： 为解决基于Transformer的方法存在计算成本高和无法捕捉时间序列总体趋势的问题,将Transformer与季节性趋势分解法相结合,提出基于分解式Transformer的联邦长期时间序列预测算法,其中分解方法用于捕捉时间序列的全局概况。在实际场景中,时间序列数据来自多个不同客户端。考虑数据隐私问题,利用联邦学习从多个客户端获得整体最优预测模型,采用基于局部锐度感知最小化的优化器提高全局模型的泛化性。与先进的方法相比,该方法在4个基准数据集的多变量和单变量时间序列预测任务中都有改进,在用电负荷(electricity consuming load, ECL)数据集上性能最高可提升26.9%。试验结果充分表明季节性趋势分解法与局部锐度感知最小化的优化器在长期时间序列预测任务上的有效性。

关键词: 隐私保护, 联邦学习, 长期预测, 模型泛化, Transformer

中图分类号:

TP183

刘冬兰,刘新,刘家乐,赵鹏,常英贤,王睿,姚洪磊,罗昕. 基于分解式Transformer的联邦长期时间序列预测算法[J]. 山东大学学报 (工学版), 2024, 54(5): 101-110.

LIU Donglan, LIU Xin, LIU Jiale, ZHAO Peng, CHANG Yingxian, WANG Rui, YAO Honglei, LUO Xin. Federated long-term time series forecasting algorithm based on decomposed Transformer[J]. Journal of Shandong University(Engineering Science), 2024, 54(5): 101-110.

[1] 于海东, 刘文彬, 文祥宇. 基于强化学习的电动出租车充电负荷预测[J]. 山东电力技术, 2022, 49(4): 7-14. YU Haidong, LIU Wenbin, WEN Xiangyu. Electrictaxi charging load forecasting based on reinforcement learning[J]. Shandong Electric Power Technology, 2022, 49(4): 7-14.
[2] 刘萌, 田雨扬, 谢鑫, 等. 基于模型预测控制的空气源热泵负荷目标温度控制策略[J]. 山东电力技术, 2022, 49(12): 53-59. LIU Meng, TIAN Yuyang, XIE Xin, et al. Load target temperature control strategy of air source heat pump based on model predictive control[J]. Shandong Electric Power Technology, 2022, 49(12): 53-59.
[3] 路宽, 曲建璋, 高嵩, 等. 基于变分推断的超短期风电功率预测[J]. 山东电力技术, 2023, 50(4): 13-21. LU Kuan, QU Jianzhang, GAO Song, et al. Ultra-short-term wind power prediction based on variational inference[J]. Shandong Electric Power Technology, 2023, 50(4): 13-21.
[4] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30: 5998-6008.
[5] ZHOU H, ZHANG S, PENG J, et al. Informer:beyond efficient transformer for long sequence time-series forecasting[C] //Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI, 2021: 11106-11115.
[6] WU H, XU J, WANG J, et al. Autoformer:decomposition Transformers with auto-correlation for long-term series forecasting[J]. Advances in Neural Information Processing Systems, 2021, 34: 22419-22430.
[7] CLEVELAND R B, CLEVELAND W S, MCRAE J E, et al. STL: a seasonal-trend decomposition[J]. Journal of Official Statistics, 1990, 6(1): 3-73.
[8] WEN Q, GAO J, SONG X, et al. RobustSTL:a robust seasonal-trend decomposition algorithm for long time series[C] //Proceedings of the AAAI Conference on Artificial Intelligence. Honolulu, USA: AAAI, 2019: 5409-5416.
[9] ORESHKIN B N, CARPOV D, CHAPADOS N, et al. N-BEATS:neural basis expansion analysis for interpretable time series forecasting[C] //International Conference on Learning Representations. New Orleans, USA: ICLR, 2019: 1-31.
[10] MCMAHAN B, MOORE E, RAMAGE D, et al. Communication-efficient learning of deep networks from decentralized data[C] //Proceedings of the 20th International Conference on Artificial Intelligence and Statistics(AISTATS). Fort Lauderdale, Flordia, USA: JMLR, 2017: 1273-1282.
[11] FORET P, KLEINER A, MOBAHI H, et al. Sharpness-awareminimization for efficiently improving generalization[C] //International Conference on Learning Rep-resentations. Addis Ababa, Ethiopia: ICLR, 2020: 1-19.
[12] ZHOU T, MA Z, WEN Q, et al. FEDformer:frequency enhanced decomposed Transformer for long-term series forecasting[C] //International Conference on Machine Learning. Baltimore, USA: PMLR, 2022: 27268-27286.
[13] RANGAPURAM S S, SEEGER M W, GASTHAUS J, et al. Deep state space models for time series forecasting[J]. Advances in Neural Information Processing Systems, 2018, 31: 7785-7794.
[14] SALINAS D, FLUNKERT V, GASTHAUS J, et al. DeepAR:probabilistic forecasting with autoregressive recurrent networks[J]. International Journal of Forecasting, 2020, 36(3): 1181-1191.
[15] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[J]. Advances in Neural Information Processing Systems, 2014, 27: 3104-3112.
[16] CHO K, VAN MERRIËNBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[EB/OL].(2014-09-03)[2023-11-07]. https://arxiv.org/abs/1406.1078.
[17] ARIYO A A, AEDWUMI A O, AYO C K. Stock price prediction using the ARIMA model[C] //Proceedings of the 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation. Cambridge, UK: IEEE, 2014: 106-112.
[18] TAYLOR S J, LETHAM B. Forecasting at scale[J]. The American Statistician, 2018, 72(1): 37-45.
[19] DEVLIN J, CHANG M W, LEE K, et al. BERT:pre-training of deep bidirectional Transformers for language understanding[EB/OL].(2019-05-24)[2023-11-07]. https://arxiv.org/abs/1810.04805.
[20] HUANG C Z A, VASWANI A, USZKOREIT J, et al. Music Transformer:generating music with long-term structure[C] //International Conference on Learning Representations. New Orleans, USA: ICLR, 2019: 1-15.
[21] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. Animage is worth 16×16 words: Transformers for image recognition at scale[C] //International Con-ference on Learning Representations. [S.l.] : ICLR, 2021: 1-21.
[22] RAO Y, ZHAO W, ZHU Z, et al. Global filter networks for image classification[J]. Advances in Neural Information Processing Systems, 2021, 34: 980-993.
[23] LI S, JIN X, XUAN Y, et al. Enhancing the locality and breaking the memory bottleneck of Transformer on time series forecasting[J]. Advances in Neural Information Processing Systems, 2019, 32: 5243-5253.
[24] WANG S, LI B Z, KHABSA M, et al. Linformer:self-attention with linear complexity[EB/OL].(2020-06-14)[2023-11-07]. https://arxiv.org/abs/2006.04768.
[25] KAIROUZ P, MCMAHAN H B, AVENT B, et al. Advances and open problems in federated learning[J]. Foundations and Trends in Machine Learning, 2021, 14(1/2): 1-210.
[26] MOHRI M, SIVEK G, SURESH A T. Agnostic federated learning[C] //International Conference on Machine Learning. California, USA: PMLR, 2019: 4615-4625.
[27] LI T, SAHU A K, ZAHEER M, et al. Federated optimization in heterogeneous networks[C] //Proceedings of Machine Learning and Systems. Austin, USA: MLSys, 2020: 429-450.
[28] VENKATESWARAN P, ISAHAGIAN V, MUTHUS-AMY V, et al. FedGen: generalizable federated learning for sequential data[C] //Proceedings of the 2023 IEEE 16th International Conference on Cloud Computing(CLOUD). Chicago, USA: IEEE, 2023: 308-318.
[29] ZONG L, XIE Q, ZHOU J, et al. FedCMR:federated cross-modal retrieval[C] //Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2021: 1672-1676.
[30] LI Q, HE B, SONG D. Model-contrastive federated learning[C] //Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE, 2021: 10713-10722.
[31] DURMUS A E, YUE Z, RAMON M, et al.Federated learning based on dynamic regularization[C] //International Conference on Learning Representations. [S.l.] : ICLR, 2021: 1-36.
[32] YAO D, PAN W, DAI Y, et al. Local-global knowledge distillation in heterogeneous federated learning with non-IID data[EB/OL].(2021-09-13)[2023-11-07]. https://arxiv.org/abs/2107.00051.
[33] WANG J, TANTIA V, BALLAS N, et al. SlowMo:improving communication-efficient distributed SGD with slow momentum[C] //International Conference on Learning Representations. Addis Ababa, Ethiopia: ICLR, 2020: 1-27.
[34] REDDI S J, CHARLES Z, ZAHEER M, et al. Adaptivefederated optimization[C] //International Con-ference on Learning Representations. [S.l.] : ICLR, 2021: 1-38.
[35] KARIMIREDDY S P, JAGGI M, KALE S, et al. Breaking the centralized barrier for cross-device federated learning[J]. Advances in Neural Information Processing Systems, 2021, 34: 28663-28676.
[36] KHANDURI P, SHARMA P, YANG H, et al. STEM:a stochastic two-sided momentum algorithm achieving near-optimal sample and communication complexities for federated learning[J]. Advances in Neural Information Processing Systems, 2021, 34: 6050-6061.
[37] LAKSHMINARAYANAN B, PRITZEL A, BLUNDELL C. Simple and scalable predictive uncertainty estimation using deep ensembles[J]. Advances in Neural Information Processing Systems, 2017, 30: 6402-6413.
[38] WOODWORTH B, GUNASEKAR S, LEE J D, et al. Kernel and rich regimes in overparametrized models[C] //Conference on Learning Theory. Texas, USA: PMLR, 2020: 3635-3673.
[39] QU Z, LI X, DUAN R, et al. Generalized federated learning via sharpness aware minimization[C] //International Conference on Machine Learning. Baltimore, USA: PMLR, 2022: 18250-18280.
[40] CHAUDHARI P, CHOROMANSKA A, SOATTO S, et al. Entropy-SGD:biasing gradient descent into wide valleys[J]. Journal of Statistical Mechanics: Theory and Experiment, 2019, 2019(12): 124018.
[41] KARIMIREDDY S P, KALE S, MOHRI M, et al. SCAFFOLD: stochastic controlled averaging for federated learning[C] //International Conference on Machine Learning. Texas, USA: PMLR, 2020: 5132-5143.
[42] LI T, SANJABI M, BEIRAMI A, et al. Fairresource allocation in federated learning[C] //International Conference on Learning Representations. Addis Ababa, Ethiopia: ICLR, 2020: 1-27.
[43] REISIZADEH A, FARNIA F, PEDARSANI R, et al. Robust federated learning:the case of affine distribution shifts[J]. Advances in Neural Information Processing Systems, 2020, 33: 21554-21565.
[44] KINGMA D P, BA J. Adam: a method for stochastic optimization[EB/OL].(2017-01-30)[2023-11-07]. https://arxiv.org/abs/1412.6980.
[45] KITAEV N, KAISER L, LEVSKAYA A. Reformer:the efficient transformer[C] //International Conference on Learning Representations. Addis Ababa, Ethiopia: ICLR, 2020: 1-12.

Just accepted

Online first

Just accepted

Online first

Viewed

Full text

	From	local

	Times	99
	Rate	100%

Abstract

389

Just accepted	Online first	Issue

0	0	389

From	Others	local

Times	383	6
Rate	98%	2%

Cited

Web of Science	Crossref	ScienceDirect	Search for Citations in Google Scholar >>


This page requires you have already subscribed to WoS.

Shared

Discussed