(OpenAI Gym agent) First Attempt -----Agent with predefined rules using CartPole-v1

原創

2019-10-26 11:05

Introductions:
Using ‘CartPole-v1’ environment in gym, game introduction:

[Retrieved from https://gym.openai.com/envs/CartPole-v1/]
Test Environment:
windows 10, python 3,

Experimental Procedure：
1）A random Demo on ‘CartPole-v1’:
action = env.action_space.sample()

2) See observations on random agent:
2.1 some important elements on the environment:
◆ Observation(object): an environment-specific object representing your observation of the environment.
◆ Reward(float): amount of reward achieved by the previous action. The scale varies between environments, but the goal is always to increase your total reward.
◆ Done(boolean): whether it’s time to reset the environment again. Most (but not all) tasks are divided up into well-defined episodes, and done being True indicates the episode has terminated.\
◆ Info(dict): diagnostic information useful for debugging. It can sometimes be useful for learning

2.2 results on random agents

Count the timesteps of random agent after 20 episodes:
We can see the average result on random agent: 20.25

An agent based on greedy rules:
◆ A very simple naïve idea：actions of the agent cart changes frame-by-frame based and only based on the last action it took, heading to it’s opposite direction.
𝐴𝐶𝑇𝐼𝑂𝑁𝑛+1 = 𝐴𝐶𝑇𝐼𝑂𝑁𝑁 ^ 1

Still, count the timesteps of greedy agent after 20 episodes. We can see the average result on random agent: 34.9: Better performance.

An agent based on predefined rules:
Define function next move, based on pre_action and last_observation:
def next_move(observation,pre):
return (observation[1] < -0.02 or(observation[1] <= 0 and pre == 0))

We can see the average result on random agent: 41.15: about the same result as greedy_agent, better performance than random_agent.

#原文鏈接(https://blog.csdn.net/ALPS233/article/details/102736708)

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

(OpenAI Gym agent) First Attempt -----Agent with predefined rules using CartPole-v1

【bzoj 1610: [Usaco2008 Feb]Line連線遊戲】枚舉

[bzoj1609]: [Usaco2008 Feb]Eating Together麻煩的聚餐遞推

[bzoj1603]: [Usaco2008 Oct]打穀機搜索

【點分治總結】

[bzoj1601]: [Usaco2008 Oct]灌水牧場行走最小生成樹

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結