Resources in Reinforcement Learning
Introduction to Reinforcement Learning
Reinforcement Learning (RL) is a computational approach to learning from action: an agent will interact with its environment, learn from it, perform actions, and receive rewards as feedback for these actions.
Three types of Reinforcement Learning Techniques
Value-Based RL
In value-based RL, the goal is to optimize the value function V(s) or an action value function Q(s,a).
The value function tells us the maximum expected future reward that the agent will get at each state.
The value of a given state is the total amount of the reward that an agent can expect to accumulate over the future, starting at the state.
The agent will use this value function to select which state to choose at each step. The agent selects the state with the biggest value.
In the maze example, at each step, we will take the biggest value: -7, then -6, then -5 (and so on) to attain the goal.
Policy-Based RL
In policy-based RL, we want to directly optimize the policy function π(s) without using a value function.
The policy is what defines the agent behavior at a given time.
action = policy(state)
We learn a policy function that lets us map each state to the best corresponding action.
We have two types of policies:
- Deterministic: a policy at a given state will always return the same action.
- Stochastic: outputs a probability distribution over actions.
Probability of taking that action at that state
As we can see here, the policy directly indicates the best action to take for each step.
Model-Based RL
In model-based RL, we model the environment. That is, we create a model that describes the behavior of the environment.
The problem with this approach is that each environment will need a different model representation. Therefore, having a general agent is not the best strategy.
Now that you understood the basics of Reinforcement Learning, you can dive deeper with this article.
RL algorithms
Algorithm | Type | Difficulty Level | Explanation | Implementation |
Q-Learning | Value-based | 1 | ||
Deep Q-Learning | Value-based | 2 | ||
Double Dueling Deep Q-Learning | Value-based | 3 | ||
Policy Gradients | Policy-based | 2 | ||
Advantage Actor-Critic (A2C) | Actor-Critic | 4 | ||
Asynchronous Actor-Critic (A3C) | Actor-Critic | 4 | ||
Proximal Policy Optimization (PPO) | Actor-Critic | 5 |
Advanced Topics
Article | Topic | Difficulty Level |
Curiosity through next-state prediction | Curiosity | 5 |
Curiosity through random network distillation | Curiosity | 5 |
Episodic Curiosity through Reachability | Curiosity | 6 |
An Introduction to Unity ML-Agents | ML-Agents | 1 |
Diving Deeper into Unity ML-Agents | ML-Agents | 2 |
Unity-ML Agents: The Mayan Adventure | ML-Agents | 3 |
Learning Resources
Resource | Topic | Link |
OpenAI Spinning-Up RL | Introduction to RL | |
Deep Reinforcement Learning Course | Introduction to RL Hands-On | |
Reinforcement Learning, Richard Sutton | Book | |
WildML Reinforcement Learning | Hands-on | |
DeepMind Advanced Deep Learning and Reinforcement Learning | Advanced topics | https://github.com/enggen/DeepMind-Advanced-Deep-Learning-and-Reinforcement-Learning |
Unity ML-Agents Course | ML-Agents |
Environments ️
Environment | Description | Link |
OpenAI Gym ️ | Gym is a toolkit for developing and comparing reinforcement learning algorithms. It supports teaching agents everything from walking to playing games like Pong or Pinball. | |
OpenAI Retro ️ | Gym Retro lets you turn classic video games (NES, SNES, Genesis…) into Gym environments for reinforcement learning. | |
ML-Agents ️ | Open-source Unity plugin that enables games and simulations to serve as environments for training intelligent agents. | |
Vizdoom ️
| Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information. | |
MameRL ️ | A Python toolkit used to train reinforcement learning algorithms against arcade games | |
TradingGym | Trading and Backtesting environment for training reinforcement learning agent or simple rule base algorithm. |