Introductions:
Using ‘CartPole-v1’ environment in gym, game introduction:
[Retrieved from https://gym.openai.com/envs/CartPole-v1/]
Test Environment:
windows 10, python 3,
Experimental Procedure:
1)A random Demo on ‘CartPole-v1’:
action = env.action_space.sample()
2) See observations on random agent:
2.1 some important elements on the environment:
◆ Observation(object): an environment-specific object representing your observation of the environment.
◆ Reward(float): amount of reward achieved by the previous action. The scale varies between environments, but the goal is always to increase your total reward.
◆ Done(boolean): whether it’s time to reset the environment again. Most (but not all) tasks are divided up into well-defined episodes, and done being True indicates the episode has terminated.\
◆ Info(dict): diagnostic information useful for debugging. It can sometimes be useful for learning
2.2 results on random agents
Count the timesteps of random agent after 20 episodes:
We can see the average result on random agent: 20.25
- An agent based on greedy rules:
◆ A very simple naïve idea:actions of the agent cart changes frame-by-frame based and only based on the last action it took, heading to it’s opposite direction.
𝐴𝐶𝑇𝐼𝑂𝑁𝑛+1 = 𝐴𝐶𝑇𝐼𝑂𝑁𝑁 ^ 1
Still, count the timesteps of greedy agent after 20 episodes. We can see the average result on random agent: 34.9: Better performance.
- An agent based on predefined rules:
Define function next move, based on pre_action and last_observation:
def next_move(observation,pre):
return (observation[1] < -0.02 or(observation[1] <= 0 and pre == 0))
We can see the average result on random agent: 41.15: about the same result as greedy_agent, better performance than random_agent.
#原文鏈接(https://blog.csdn.net/ALPS233/article/details/102736708)