基于贝叶斯优化的强化学习广义不动点解逼近

PDF (PC)

Bayesian optimization-based generalized fixed point approximation

Abstract

Related Articles 15

Metrics

Comments

Recommended 10

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 10

doi:10.6040/j.issn.1672-3961.0.2023.148

CLC Number:

TP311

CHEN Xingguo, LÜ Yongzhou, GONG Yu, CHEN Yaoxiong. Bayesian optimization-based generalized fixed point approximation[J].Journal of Shandong University(Engineering Science), 2024, 54(4): 21-34.

[1] 陈兴国, 孙丁源昊, 杨光, 等. 不动点视角下的强化学习算法综述[J]. 计算机学报, 2023, 46(6): 1246-1271. CHEN Xingguo, SUN Dingyuanhao, YANG Guang, et al. A Survey of Reinforcement Learning Algorithms from a Fixed Point Perspective[J]. Chinese Journal of Computers, 2023, 46(6): 1246-1271.
[2] GEIST M, PIETQUIN O. Algorithmic survey of parametric value function approximation[J]. IEEE Transactions on Neural Networks and Learning Systems, 2013, 24(6): 845-867.
[3] PUTERMAN M L. Markov decision processes: discrete stochastic dynamic programming[M]. New York, USA: John Wiley & Sons, 2014.
[4] SUTTON R S. Learning to predict by the methods of temporal differences[J]. Machine Learning, 1988, 3(1): 9-44.
[5] BRADTKE S J, BARTO A G. Linear least-squares algorithms for temporal difference learning[J]. Machine Learning, 1996, 22(1-3): 33-57.
[6] GERAMIFARD A, BOWLING M, SUTTON R S. Incremental least-squares temporal difference learning[C] //Proceedings of the 21st National Conference on Artificial Intelligence. Massachusetts, USA: AAAI Press, 2006: 356-361.
[7] ERNST D, GEURTS P, WEHENKEL L. Tree-based batch mode reinforcement learning[J]. Machine Learning, 2005, 6(4): 503-556.
[8] KUMAR H, KOPPEL A, RIBEIRO A. On the sample complexity of actor-critic method for reinforcement learning with function approximation[J]. Machine Learning, 2023: 1-35.
[9] SUTTON R S, MAEI H, SZEPESVáRI C. A convergent O(n)temporal-difference algorithm for off-policy learning with linear function approximation[C] //Advances in Neural Information Processing Systems.Cambridge, USA: MIT Press, 2008: 1609-1616.
[10] SUTTON R S, MAEI H R, PRECUP D, et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation[C] //Proceedings of the 26th International Conference on Machine Learning. New York, USA: ACM, 2009: 993-1000.
[11] BAIRD L. Residual algorithms: reinforcement learning with function approximation[C] // Proceedings of the 12th International Conference on Machine Learning. San Francisco, USA: Morgan Kaufmann Publishers Inc, 1995: 30-37.
[12] SCHERRER B. Should one compute the temporal difference fix point or minimize the bellman residual? the unified oblique projection view[C] //Proceedings of the 27th International Conference on Machine Learning. Madison, USA: Omnipress, 2010: 959-966.
[13] TSITSIKLIS J, VAN ROY B. An analysis of temporal-difference learning with function approximation technical[J]. IEEE Transactions on Automatic Control, 1997, 42(5): 674-690.
[14] BOYAN J A. Technical update: least-squares temporal difference learning[J]. Machine Learning, 2002, 49(2-3): 233-246.
[15] GERAMIFARD A, BOWLING M, ZINKEVICH M, et al. iLSTD: eligibility traces and convergence analysis[C] //Advances in Neural Information Processing Systems. Cambridge, USA: MIT press, 2006: 441-448.
[16] MAEI H R, SUTTON R S. GQ(λ): a general gradient algorithm for temporal-difference prediction learning with eligibility traces[C] //Proceedings of the 3rd Conference on Artificial General Intelligence. Paris, France: Atlantis, 2010: 91-96.
[17] JOHNS J, PETRIK M, MAHADEVAN S. Hybrid least-squares algorithms for approximate policy evaluation[J]. Machine Learning, 2009, 76(1): 243-256.
[18] 吴毓双, 陈筱语, 马静雯, 等. 基于一般化斜投影的异策略时序差分学习算法[J]. 南京大学学报(自然科学版), 2017, 53(6): 1052-1062. WU Yushuang, CHEN Xiaoyu, MA Jingwen, et al. Off-policy linear temporal difference learning algorithms with a generalized oblique projection[J]. Journal of Nanjing University(Natural Science), 2017, 53(6): 1052-1062.
[19] SUTTON R S, BARTO A G. Reinforcement learning: an introduction[M]. Cambridge, USA: MIT press, 2018.
[20] BERTSEKAS D, TSITSIKLIS J N. Neuro-dynamic programming[M]. Belmont, USA: Athena Scientific, 1996.
[21] ANTOS A, SZEPESVÁRI C, MUNOS R. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path[J]. Machine Learning, 2008, 71(1): 89-129.
[22] SCHERRER B, GHAVAMZADEH M, GABILLON V, et al. Approximate modified policy iteration and its application to the game of Tetris[J]. Machine Learning, 2015, 16(49): 1629-1676.
[23] 厉海涛, 金光, 周经伦, 等. 贝叶斯网络推理算法综述[J]. 系统工程与电子技术, 2008, 30(5): 935-939. LI Haitao, JIN Guang, ZHOU Jinglun, et al. A review of bayesian network inference algorithms[J]. Systems Engineering and Electronics, 2008, 30(5): 935-939.
[24] LIOR R. Ensemble-based classifiers[J]. Artificial Intelligence Review, 2010, 33(1-2): 1-39.
[25] LICHTENBERG J M, ?瘙塁IM?瘙塁EK Ö. Regularization in directable environments with application to Tetris[C] //Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA: IMLS, 2019: 3953-3962.
[26] DEMAINE E D, HOHENBERGER S, LIBEN-NOWELL D. Tetris is hard, even to approximate[C] // Proceedings of the 9th International Conference on Computing and Combinatorics. Berlin, Germany: Springer, 2003: 351-363.
[27] FARIAS V F, VAN ROY B. Tetris: A study of randomized constraint sampling[C] //Probabilistic and Randomized Methods for Design under Uncertainty. London, UK: Springer, 2006: 189-201.
[28] THIERY C, SCHERRER B. Improvements on learning Tetris with cross entropy[J]. International Computer Games Association Journal, 2009, 32(1): 23-33.

[1]	HOU Yanchen, ZHAO Jindong. SPK-means: a clustering algorithm for arbitrary shapes [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 87-92.
[2]	CHU Jiajing, PAN Qingxian, PAN Ya'nan, LIU Qingju. Crowdsourcing quality control algorithm based on reputation model [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 93-101.
[3]	LIU Bin, WANG Lei , WANG Chong, CAI Xiangxiang. An incremental method for updating approximations of consistent blocks while the universe evolves over time [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 109-117.
[4]	Hao XIAO,Zhuhua LIAO,Yizhi LIU,Silin LIU,Jianxun LIU. Unmanned vehicle path planning based on deep Q learning in real environment [J]. Journal of Shandong University(Engineering Science), 2021, 51(1): 100-107.
[5]	Zhuoyu XIAO,Pei HE,Guo CHEN,Yunbiao XU,Jie GUO. Design pattern classification mining with feature metrics constraints [J]. Journal of Shandong University(Engineering Science), 2020, 50(6): 48-58.
[6]	Wenkai ZHANG,Ke YU,Xiaofei WU. Entity recommendation based on normalized similarity measure of meta graph in heterogeneous information network [J]. Journal of Shandong University(Engineering Science), 2020, 50(2): 66-75.
[7]	Chao FENG,Kunpeng XU,Lifei CHEN. LDA-based topic feature representation method for symbolic sequences [J]. Journal of Shandong University(Engineering Science), 2020, 50(2): 60-65.
[8]	Delei CHEN, Cheng WANG, Jianwei CHEN, Yiyin WU. GRU-based collaborative filtering recommendation algorithm with active learning [J]. Journal of Shandong University(Engineering Science), 2020, 50(1): 21-27.
[9]	Qijie ZOU,Haoyu LI,Rubo ZHANG,Tengda PEI,Yan LIU. Survey of human-robot interaction control for autonomous driving [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 23-33.
[10]	Zhongwei ZHANG,Hongyan MEI,Jun ZHOU,Huiping JIA. A rule extraction method based on multi-objective co-evolutionarygenetic algorithm [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 122-130.
[11]	Xiaoyan GONGYE,Peiguang LIN,Weilong REN. Genetic algorithm based on Grefenstette coding and 2-opt optimized [J]. Journal of Shandong University(Engineering Science), 2018, 48(6): 19-26.
[12]	HE Dongzhi, ZHANG Jifeng, ZHAO Pengfei. Parallel implementing probabilistic spreading algorithm using MapReduce programming mode [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 0, (): 22-28.
[13]	DU Xixi, LIU Huafeng, JING Liping. An additive co-clustering for recommendation of integrating social network [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 96-102.
[14]	SHEN Ji, MA Zhiqiang, LI Tuya, ZHANG Li. A word extend LDA model for short text sentiment [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 120-126.
[15]	WANG Huan, ZHOU Zhongmei. An over sampling algorithm based on clustering [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 134-139.

Viewed

Full text

From	Others	local

Times	5	84
Rate	6%	94%

Abstract

316

Just accepted	Online first	Issue

0	0	316

From	Others	local

Times	314	2
Rate	99%	1%

Cited

Web of Science	Crossref	ScienceDirect	Search for Citations in Google Scholar >>


This page requires you have already subscribed to WoS.

Shared

Discussed

[1]	ZHAO Jian-yu,JIA Lei,ZHU Wen-xing,YANG Li-cai . Design of fuzzy control for traffic signals of a urban arterial intersection[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(1): 46 -50 .
[2]	XIA Hui1, WANG Hua1, CHEN Xi2. A kind of ant colony parameter adaptive optimization algorithm based on particle swarm optimization thought[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(3): 26 -30 .
[3]	HUANG Chuan-zhen1,2, ZHUANG Xin-qiang1,2, ZOU Bin1,2, LIU Zi-ye1,2. A study on high speed cutting database of die steel for automobile covering panels[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(5): 9 -12 .
[4]	Chen Dongyan. [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 41 -49 .
[5]	HUANG Tian-qiang1,2, CHEN Zhi-wen1. Digital video forgeries detection based on bidirectional motion vectors[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(4): 13 -19 .
[6]	YAO Zhan-yong, HAN Jie*, SHANG Qing-sen, GE Zhi, ZHANG Xiao-meng, CUI Heng. Research on pressure sensitivity of the conductive asphalt mortar with carbon fiber and graphite powders[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2013, 43(1): 80 -85 .
[7]	CAI Nian, ZHANG Guo-hong, LOU Peng-xu, DAI Qing-yun. Image retrieval for a design patent based on shape features and texture features[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(2): 1 -4 .
[8]	PAN Dong-yin, ZHU Fa, XU Sheng, YE Ning*. Feature selection of gene expression profiles of colon cancer[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2012, 42(2): 23 -29 .
[9]	GUO Chao, YANG Yan, JIANG Yongquan, SONG Yi. Condition recognition of high-speed train based on multi-view classification ensemble[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(1): 7 -14 .
[10]	ZHANG Xin-guo1, XU Chong-fang1*, WANG Jin-shuang1, YAN Ji-cong1, HAN Ting-wu1,2. The design method and application of the non-inductive Chua′s circuit[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(6): 134 -138 .

Just accepted

Online first

Just accepted

Online first