最美情侣中文字幕电影,在线麻豆精品传媒,在线网站高清黄,久久黄色视频

歡迎光臨散文網(wǎng) 會員登陸 & 注冊

Reinforcement Learning_Code_Value Function Approximation

2023-04-08 11:30 作者:別叫我小紅  | 我要投稿

Following results and code are the implementation of value function approximation, including Monte Carlo, Sarsa and deep Q-learning, in?Gymnasium's?Cart Pole environment.


RESULTS:

Visualizations of (i)?changes?in?scores, losses and epsilons, and (ii) animation results.

1. Monte Carlo

Fig. 1.1. Changes in scores, losses and epsilons.
Fig.?1.2. Animation results.

2. Sarsa

Original Sarsa, which is exactly what is used here, may has same need as Q-learning, which refers to a resample buffer.?

Because in original implementation of? Sarsa and Q-learning, q-value is updated once an action is taken, which will lead the algorithm to extreme instability.?

So, to get better results, we need to update q-value after a number of steps, which means introducing experience replay.


Fig. 2.1. Changes in scores, losses and epsilons.
Fig. 2.2. Animation results.

3. Deep Q-learning

Here we use experience replay and fixed Q-targets.

Fig. 3.1. Changes in scores, losses and epsilons.
Fig. 3.2. Animation results.


CODE:

NetWork.py


MCAgent.py


SarsaAgent.py


ReplayBuffer.py


DQNAgent.py


train_and_test.py


The above code?are mainly based on rainbow-is-all-you-need[1] and extend?solutions to Monte Carlo and?Sarsa.


Reference

[1] https://github.com/Curt-Park/rainbow-is-all-you-need


Reinforcement Learning_Code_Value Function Approximation的評論 (共 條)

分享到微博請遵守國家法律
新沂市| 任丘市| 怀来县| 石林| 广西| 甘泉县| 姜堰市| 灵台县| 瑞丽市| 丁青县| 海丰县| 岳阳市| 玛纳斯县| 比如县| 贵港市| 胶南市| 江源县| 阳高县| 吴忠市| 合山市| 绍兴县| 靖州| 新民市| 那曲县| 无为县| 军事| 南部县| 兴城市| 留坝县| 武清区| 桂阳县| 乌拉特后旗| 天峻县| 松阳县| 延安市| 克什克腾旗| 巴南区| 和静县| 乾安县| 宁城县| 江西省|