Get to know ben_p with this User Highlight Learn More

Conundrum 11: Reinforcement Learning - Wall Jump Agent

Community Manager
Community Manager
Conundrum 11: Reinforcement Learning - Wall Jump Agent

Welcome to the new Community Conundrum! This week you’ll train your first deep reinforcement learning agent to jump over walls.

 

MichaelG_0-1591635145653.gif

 

If you don’t know anything about reinforcement learning, don’t worry, it’s a beginner-friendly conundrum. And good news, you’ll not need GPU.

 

Situation

The environment

Our goal is to train our agent (blue cube) to go on the green tile.

 

However, there are three situations:

  • The first, you have no walls, our agent just needs to go on the green tile.

MichaelG_1-1591635145784.png

 

No Wall situation

 

  • In the second situation, the agent needs to learn to jump to reach the green tile.

MichaelG_2-1591635145712.png

Small Wall Situation

  • Finally, in the hardest situation, our agent will not be able to jump as high as the wall is so he needs to push the white block in order to jump on it to be able to jump over the wall.

 

MichaelG_3-1591635145650.png

Big Wall Situation

 

 

We’ll learn two different policies (behaviors) depending on the height of the wall:

  • The first SmallWallJump will be learned during the no wall and low wall situations.
  • The second, BigWallJump, will be learned during the high wall situations.

The reward system is:

MichaelG_4-1591635145651.png

In terms of observation, we don’t use normal vision (frames) but 14 raycasts that can each detect 4 possible objects. Think of raycasts as lasers that will detect if it passes through object.

We also use the global position of the agent and whether or not is grounded.

 

MichaelG_5-1591635145676.png

 

Source: Unity ML-Agents Documentation

The action space is discrete with 4 branches:

MichaelG_6-1591635145646.png

Our goal is to hit the benchmark with a mean reward of 0.8.

 

Deep Reinforcement Learning and PPO

 

We’ll use Deep Reinforcement Learning to solve this problem. 

What is Deep Reinforcement Learning? Well this article will give you what you need to do to make this conundrum: https://academy.dataiku.com/reinforcement-learning-open/513305

More precisely, we’ll use a deep reinforcement learning algorithm called PPO and your goal will be to tune the hyperparameters to build a smart agent.

PPO is an Actor-Critic Algorithm. Actor Critic is a smart method: imagine you play a video game with a friend that provides you some feedback. You’re the Actor and your friend is the Critic.

 

MichaelG_7-1591635145669.png

 

At the beginning, you don’t know how to play, so you try some action randomly. Your friend, the Critic observes your action and provides feedback.

Learning from this feedback, you’ll update your policy and be better at playing that game.

On the other hand, your friend (Critic) will also update their own way to provide feedback so it can be better next time.

As we can see, the idea of Actor-Critic is to have two neural networks. We estimate both:

MichaelG_8-1591635145637.png

 

ACTOR: A policy function, controls how our agent acts.

 

MichaelG_9-1591635145652.png

 

CRITIC: A value function, measures how good these actions are.

Both run in parallel.

 

The steps

  1. If you’re new to reinforcement learning, you should read this article that will give you the fundamentals: https://academy.dataiku.com/reinforcement-learning-open/513305
  2. Install UnityML-Agents python package by following the documentation: https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md#install-the-mlagent...
  3. Download Unity ML-Agents conundrum zip file. 
  4. Open the config file ml-agents-conundrum/config/conundrum_config.yaml

 

The training step

Our config file looks like this:

 

MichaelG_10-1591635145657.png

 

 

For this exercise you’ll have to think about:

 

  • Is this a simple problem or a more complex problem? This depends on the situation think about the difference in complexity between no wall and big wall situations. Answering this question will help you to define the num_layers and hidden_units hyperparameters.
  • Is this a problem that can be solved rapidly or it needs a lot of training? Answering this question will help you to define the max_steps and batch_size hyperparameters.
  • Finally, does our agent cares more about the long term reward or short term reward? Answering this question will help you to define the gamma (reward discount rate) hyperparameter.

 

Setting

Description

max_steps

The total number of steps (observation collected and action taken) that must be taken in the environments before ending the training process.


Hint: We used 300k training steps to reach the 0.8 baseline

batch_size

The number of experiences (observation collected, action taken, reward and next state) fed in each iteration of gradient descent.


Hint: The typical range is from 32 to 512 (always a multiple so 32, 64, 128, 256, 512).

hidden_units

The number of units in the hidden layers of the neural network


This number grows if the action is a very complex interaction between the observation variables


Hint: The typical range is from 64 to 512.



num_layers

The number of hidden layers in the neural network. Corresponds to how many hidden layers are present after the observation input, or after the CNN encoding of the visual observation. 


For simple problems, fewer layers are likely to train faster and more efficiently. More layers may be necessary for more complex control problems. 


Hint: The typical range is from 2 to 4.

gamma

Discount factor for future rewards coming from the environment. 



This can be thought of as how far into the future the agent should care about possible rewards. 


In situations when the agent should be acting in the present in order to prepare for rewards in the distant future, this value should be large. 


In cases when rewards are more immediate, it can be smaller. 


Typical range: 0.8 - 0.995



You need to modify the config file values based on your hypothesis. Remember that the best way to learn is to be active by experimenting. So you should try to make some hypotheses and verify them.

You’re now ready to launch the training. You need to type in the terminal:

On windows:

mlagents-learn ./config/conundrum_config.yaml --env=./WINDOWS/train/WallJump --run-id=run --train

On Mac:

mlagents-learn ./config/conundrum_config.yaml --env=./MAC/train/WallJump --run-id=run --train

This will launch the game and you can see your agent performing, on the terminal you can see the training infos.

When the mean Reward reaches 0.8 you can stop the training, it will output the saved models in ./models



That’s all for today! You’ve just trained an agent that learns to jump over walls. Awesome!

If you would like to share your answer please do upload your saved models and the config file here! 

I hope you liked this introduction to deep reinforcement learning conundrum, if you want to dive deeper into Reinforcement Learning you can check these 2 articles:

 

Keep learning, stay awesome!

I hope I helped! Do you Know that if I was Useful to you or Did something Outstanding you can Show your appreciation by giving me a KUDOS?
5 Replies
Dataiker
Dataiker

Equal parts formidable and impressive! Looks like a fun one!

Level 6

Anyone want to get together and see if we can work on this as a group?

--Tom
0 Kudos
Level 3

Would love to join your team Tom, but I wouldn't be much helpful, as for this subset of machine learning topic, is relatively new for me. 

Level 6

I was thinking about getting on a zoom call and see what we can do about figuring this out together.  I'm USA Eastern Time  (Currently GMT -4)  Is there a good time to work together? 

--Tom
Level 3

Hi Tom, I'm on GMT +7. Somewhere like 12 hours ahead of you in the USA. But no worries, drop me a message, keen to explore the opportunities further. Let me know.

Thanks,