最美情侣中文字幕电影,在线麻豆精品传媒,在线网站高清黄,久久黄色视频

歡迎光臨散文網(wǎng) 會員登陸 & 注冊

Reinforcement Learning_Code_Policy Gradient

2023-04-10 23:35 作者:別叫我小紅  | 我要投稿

Following results and code are the implementation of policy gradient, including REINFORCE, in Gymnasium's Cart Pole environment.

RESULTS:

Visualizations of (i) changes in scores and?losses, and (ii) animation results.

Since REINFROCE makes use of?Monte Carlo estimation, its convergence rate is slow and it does?not converge after 10 thousand steps.

However, it has got a not too bad result and is hopefully to achieve more than 200 points if?more steps are given.

Fig. 1. Changes in scores and?losses.

Fig. 2. Animation results.


CODE:

NetWork.py


REINFORCEAgent.py


train_and_test.py


The above code are mainly based on Chapter 9 of?Hands-on Reinforcement Learning [1] and my previous implementation of value function apporximation with Mente Carlo [2].


Reference

[1]?https://hrl.boyuai.com/

[2]?https://www.bilibili.com/read/cv22924612



Reinforcement Learning_Code_Policy Gradient的評論 (共 條)

分享到微博請遵守國家法律
宜兰县| 弋阳县| 土默特右旗| 扶风县| 乐亭县| 乡宁县| 灵武市| 泸水县| 大英县| 新蔡县| 武宁县| 集安市| 离岛区| 剑河县| 衢州市| 玉环县| 松阳县| 招远市| 宁晋县| 重庆市| 龙泉市| 内江市| 澄江县| 阜南县| 都昌县| 华池县| 建宁县| 池州市| 平和县| 横峰县| 砀山县| 慈溪市| 宜宾市| 乌拉特后旗| 南溪县| 安陆市| 南木林县| 东丰县| 彭泽县| 长海县| 枞阳县|