Conundrum 11: Reinforcement Learning - Wall Jump Agent

MichaelG · ‎06-08-2020

Welcome to the new Community Conundrum! This week you’ll train your first deep reinforcement learning agent to jump over walls.

If you don’t know anything about reinforcement learning, don’t worry, it’s a beginner-friendly conundrum. And good news, you’ll not need GPU.

Situation

The environment

Our goal is to train our agent (blue cube) to go on the green tile.

However, there are three situations:

The first, you have no walls, our agent just needs to go on the green tile.

No Wall situation

In the second situation, the agent needs to learn to jump to reach the green tile.

Small Wall Situation

Finally, in the hardest situation, our agent will not be able to jump as high as the wall is so he needs to push the white block in order to jump on it to be able to jump over the wall.

Big Wall Situation

We’ll learn two different policies (behaviors) depending on the height of the wall:

The first SmallWallJump will be learned during the no wall and low wall situations.
The second, BigWallJump, will be learned during the high wall situations.

The reward system is:

In terms of observation, we don’t use normal vision (frames) but 14 raycasts that can each detect 4 possible objects. Think of raycasts as lasers that will detect if it passes through object.

We also use the global position of the agent and whether or not is grounded.

Source: Unity ML-Agents Documentation

The action space is discrete with 4 branches:

Our goal is to hit the benchmark with a mean reward of 0.8.

Deep Reinforcement Learning and PPO

We’ll use Deep Reinforcement Learning to solve this problem.

What is Deep Reinforcement Learning? Well this article will give you what you need to do to make this conundrum: https://academy.dataiku.com/reinforcement-learning-open/513305

More precisely, we’ll use a deep reinforcement learning algorithm called PPO and your goal will be to tune the hyperparameters to build a smart agent.

PPO is an Actor-Critic Algorithm. Actor Critic is a smart method: imagine you play a video game with a friend that provides you some feedback. You’re the Actor and your friend is the Critic.

At the beginning, you don’t know how to play, so you try some action randomly. Your friend, the Critic observes your action and provides feedback.

Learning from this feedback, you’ll update your policy and be better at playing that game.

On the other hand, your friend (Critic) will also update their own way to provide feedback so it can be better next time.

As we can see, the idea of Actor-Critic is to have two neural networks. We estimate both:

ACTOR: A policy function, controls how our agent acts.

CRITIC: A value function, measures how good these actions are.

Both run in parallel.

The steps

If you’re new to reinforcement learning, you should read this article that will give you the fundamentals: https://academy.dataiku.com/reinforcement-learning-open/513305
Install UnityML-Agents python package by following the documentation: https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md#install-the-mlagent...
Download Unity ML-Agents conundrum zip file.
Open the config file ml-agents-conundrum/config/conundrum_config.yaml

The training step

Our config file looks like this:

For this exercise you’ll have to think about:

Is this a simple problem or a more complex problem? This depends on the situation think about the difference in complexity between no wall and big wall situations. Answering this question will help you to define the num_layers and hidden_units hyperparameters.
Is this a problem that can be solved rapidly or it needs a lot of training? Answering this question will help you to define the max_steps and batch_size hyperparameters.
Finally, does our agent cares more about the long term reward or short term reward? Answering this question will help you to define the gamma (reward discount rate) hyperparameter.

Setting	Description
max_steps	The total number of steps (observation collected and action taken) that must be taken in the environments before ending the training process. Hint: We used 300k training steps to reach the 0.8 baseline
batch_size	The number of experiences (observation collected, action taken, reward and next state) fed in each iteration of gradient descent. Hint: The typical range is from 32 to 512 (always a multiple so 32, 64, 128, 256, 512).
hidden_units	The number of units in the hidden layers of the neural network. This number grows if the action is a very complex interaction between the observation variables Hint: The typical range is from 64 to 512.
num_layers	The number of hidden layers in the neural network. Corresponds to how many hidden layers are present after the observation input, or after the CNN encoding of the visual observation. For simple problems, fewer layers are likely to train faster and more efficiently. More layers may be necessary for more complex control problems. Hint: The typical range is from 2 to 4.
gamma	Discount factor for future rewards coming from the environment. This can be thought of as how far into the future the agent should care about possible rewards. In situations when the agent should be acting in the present in order to prepare for rewards in the distant future, this value should be large. In cases when rewards are more immediate, it can be smaller. Typical range: 0.8 - 0.995

You need to modify the config file values based on your hypothesis. Remember that the best way to learn is to be active by experimenting. So you should try to make some hypotheses and verify them.

You’re now ready to launch the training. You need to type in the terminal:

On windows:

mlagents-learn ./config/conundrum_config.yaml --env=./WINDOWS/train/WallJump --run-id=run --train

On Mac:

mlagents-learn ./config/conundrum_config.yaml --env=./MAC/train/WallJump --run-id=run --train

This will launch the game and you can see your agent performing, on the terminal you can see the training infos.

When the mean Reward reaches 0.8 you can stop the training, it will output the saved models in ./models

That’s all for today! You’ve just trained an agent that learns to jump over walls. Awesome!

If you would like to share your answer please do upload your saved models and the config file here!

I hope you liked this introduction to deep reinforcement learning conundrum, if you want to dive deeper into Reinforcement Learning you can check these 2 articles:

Resources in Reinforcement Learning: https://community.dataiku.com/t5/Product-Knowledge-Base/Resources-in-Reinforcement-Learning/ta-p/620...
An Introduction to Unity ML-Agents: https://towardsdatascience.com/an-introduction-to-unity-ml-agents-6238452fcf4c

Keep learning, stay awesome!

I hope I helped! Do you Know that if I was Useful to you or Did something Outstanding you can Show your appreciation by giving me a KUDOS?

Looking for more resources to help you use DSS effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!

taraku · ‎06-08-2020

Equal parts formidable and impressive! Looks like a fun one!

tgb417 · ‎06-09-2020

Anyone want to get together and see if we can work on this as a group?

--Tom

gerryleonugroho · ‎06-11-2020

Would love to join your team Tom, but I wouldn't be much helpful, as for this subset of machine learning topic, is relatively new for me.

tgb417 · ‎06-11-2020

I was thinking about getting on a zoom call and see what we can do about figuring this out together. I'm USA Eastern Time (Currently GMT -4) Is there a good time to work together?

--Tom

gerryleonugroho · ‎06-12-2020

Hi Tom, I'm on GMT +7. Somewhere like 12 hours ahead of you in the USA. But no worries, drop me a message, keen to explore the opportunities further. Let me know.

Thanks,

Sign up to take part

Conundrum 11: Reinforcement Learning - Wall Jump Agent

Conundrum 11: Reinforcement Learning - Wall Jump Agent

Situation

The environment

Deep Reinforcement Learning and PPO

The steps

The training step