Following the wikipeadia
Reinforcement Learning is an area of machihne learning inspired by behavioral psychology, concerned with how software agents ought to take actions in an environment so as to maximzie some notion of cumulative reward.
Behavioral Psychology
Behavior is primarily shaped by reinforcement rather than free-will.
- behaviors that result in praise/pleasure tend to repeat
- behaviors that result in punishment/pain tend to become extinct
agent
An entity (learner & decision maker) that is equipped with Sensors end-effectors and goals
Action
- Used by the agent to interact with the environment.
- May have many di↵erent temporal granularities and abstractions
reward
A reward R**t is a scalar feedback signal
Indicates how well agent is doing at step t
The agent’s job is to maximize cumulative reward
hypothesis: All goals can be described by the maximization of expected cumulative reward
Main Topics of Reinforcement Learning
Learning: by trial and error
Planning: search, reason, thought, cognition
Prediction: evaluation functions, knowledge
Control: action selection, decision making
Dynamics: how the state changes given the actions of the agent
Model-based RL
- dynamics are known or are estimated
- solving RL problems that use models and planning
Model-free RL
- unknown dynamics
- explicitly trial-and-error learners
not necessarily iid
P.S. 逆强化学习。