Reinforcement Learning (RL) is a computational approach to learning from action: an agent will interact with its environment, learn from it, perform actions, and receive rewards as feedback for these actions.
Three types of Reinforcement Learning Techniques
In value-based RL, the goal is to optimize the value function V(s) or an action value function Q(s,a).
The value function tells us the maximum expected future reward that the agent will get at each state.
The value of a given state is the total amount of the reward that an agent can expect to accumulate over the future, starting at the state.
The agent will use this value function to select which state to choose at each step. The agent selects the state with the biggest value.
In the maze example, at each step, we will take the biggest value: -7, then -6, then -5 (and so on) to attain the goal.
In policy-based RL, we want to directly optimize the policy function π(s) without using a value function.
The policy is what defines the agent behavior at a given time.
action = policy(state)
We learn a policy function that lets us map each state to the best corresponding action.
We have two types of policies:
Deterministic: a policy at a given state will always return the same action.
Stochastic: outputs a probability distribution over actions.
Probability of taking that action at that state
As we can see here, the policy directly indicates the best action to take for each step.
In model-based RL, we model the environment. That is, we create a model that describes the behavior of the environment.
The problem with this approach is that each environment will need a different model representation. Therefore, having a general agent is not the best strategy.
Now that you understood the basics of Reinforcement Learning, you can dive deeper with this article.