[RL] RL Category

Following the wikipeadia

Reinforcement Learning is an area of machihne learning inspired by behavioral psychology, concerned with how software agents ought to take actions in an environment so as to maximzie some notion of cumulative reward.


Behavioral Psychology

Behavior is primarily shaped by reinforcement rather than free-will.

  • behaviors that result in praise/pleasure tend to repeat
  • behaviors that result in punishment/pain tend to become extinct


An entity (learner & decision maker) that is equipped with Sensors end-effectors and goals



  • Used by the agent to interact with the environment.
  • May have many di↵erent temporal granularities and abstractions



A reward R**t is a scalar feedback signal

Indicates how well agent is doing at step t

The agent’s job is to maximize cumulative reward

hypothesis: All goals can be described by the maximization of expected cumulative reward


Main Topics of Reinforcement Learning

Learning: by trial and error

Planning: search, reason, thought, cognition

Prediction: evaluation functions, knowledge

Control: action selection, decision making

Dynamics: how the state changes given the actions of the agent

Model-based RL

  • dynamics are known or are estimated
  • solving RL problems that use models and planning

Model-free RL

  • unknown dynamics
  • explicitly trial-and-error learners

not necessarily iid


P.S. 逆强化学习。