Reinforcement Learning
This weekend was Santa Con 2017. Woo! It was also DLSG XV (deep learning study group 15) put together by Jon Krohn where we learned about reinforcement learning. I'm summarizing it here.
High LevelÂ
Reinforcement learning is an approach to ML type problems that has become popular. Reinforcement learning is often applied to Atari games like pac man or ping-pong. The framework it uses is different than other kinds of deep learning methods. Instead of simply having an input and an output through some neural network architecture you consider the problem from the point of view of the player or Agent. The agent can interact with its world in varying degrees of discrete or continuous decisions. These decisions or Actions lead to new states and cause the agent to succeed or fail at its objective. This success is a measurable reward which is also used to then update the policies that the agent uses to make future decisions. Actions can be as simple as moving up down left or right and states are like the position of the agent and reward is the score of the game or whether or not the agent dies / loses.Â
Deeper
Reinforcement learning can happen in different ways and there are several algorithms that do this. A Q-function is used to describe what the agent should do. It is the function that keeps all the policies the agent will use in the game or environment. A neural network is used to approximate the Q-function. A perfect Q-function indicates that the agent knows how to act optimally. Ď(s) = maxQ(s,a). Policy (pi) as a function of state equals the max Q function as a function of state and action. Q functions answer how good is a state action pair while a value function answers how good is a state
A = Q - V
Advantage (improvement made) = Q-Function (actual reward) - Value-Function (expected reward)















