Following the wikipeadia
Reinforcement Learning is an area of machihne learning inspired by behavioral psychology, concerned with how software agents ought to take actions in an environment so as to maximzie some notion of cumulative reward.
Behavior is primarily shaped by reinforcement rather than free-will.
- behaviors that result in praise/pleasure tend to repeat
- behaviors that result in punishment/pain tend to become extinct
An entity (learner & decision maker) that is equipped with Sensors end-effectors and goals
- Used by the agent to interact with the environment.
- May have many di↵erent temporal granularities and abstractions
A reward R**t is a scalar feedback signal
Indicates how well agent is doing at step t
The agent’s job is to maximize cumulative reward
hypothesis: All goals can be described by the maximization of expected cumulative reward
Main Topics of Reinforcement Learning
Learning: by trial and error
Planning: search, reason, thought, cognition
Prediction: evaluation functions, knowledge
Control: action selection, decision making
Dynamics: how the state changes given the actions of the agent
- dynamics are known or are estimated
- solving RL problems that use models and planning
- unknown dynamics
- explicitly trial-and-error learners
not necessarily iid