[RL] Probability Review

Basics

The set

general definition of Probability

样本空间

IMG_B1B034C78691-1

概率的物理意义

frequentist view: a long-run frequency over a large number of repetitions of an experiment.

Bayesian view: a degree of belief about the event in question.
We can assign probabilities to hypotheses like "candidate will win the election" or "the defendant is guilty"can't be repeated.

Markov & Monta Carlo + computing power + algorithm thrives the Bayesian view.

role

IMG_F7558CBB4476-1

条件概率

所有事情都有条件,条件就会产生概率
e.g. Conditioning -> DIVIDE & CONCUER -> recursively apply to multi-stage problem.

P(A|B) = \(\frac{P(A\ and\ B)}{P(B)}\)

chain rules

有利于分布式计算

IMG_EC132FE4D2D1-1

Inference & Bayes' RulesIMG_F7558CBB4476-1

概率分布和极限定理

PDF 概率密度函数

混合型

IMG_434B41011BCA-1

PDF

valid PDF

  1. non negative \(f(x)\geq0\)
  2. integral to 1:
    \(\int^{\infty}_{-\infty}f(x)dx=1\)

probability distribution

summary of probability distribution


三种距离衡量 in ML, DL, AI

全变量距离

usually in GAN

小数定理(稀疏事件) in poisson


去食堂吃饭人数可以用柏松分布来描述

Sample mean

强大数定理SLLN


收敛到真正的概率值以概率为一收敛

弱大数定理WLLN


以概率收敛

中心极限定理

Generating function

  1. PGF - Z
  2. MGF - Laplace
  3. CF - 傅立叶

APPLICATION

  1. branching process
  2. bridge complex and probability
  3. play a role in large deviation theory
    ## Multi variables.
    joint distribution provides complete information about how multiple r.v. interact in high-dimensional space

joint CDF &PDF



marginal PMF

conditional PMF

joint PDF



Screen Shot 2020-03-03 at 03.04.48
Screen Shot 2020-03-03 at 03.29.53
Screen Shot 2020-03-03 at 03.31.50
Screen Shot 2020-03-03 at 03.31.59
Screen Shot 2020-03-03 at 03.32.11

techniques

general Bayes' Rules.

general LOTP

change of variables


summary

Order Statistics

CDF of order statistic

Screen Shot 2020-03-03 at 03.57.04

proof

PDF of Order Statostic


two methods to find PDF

  1. CDF -differentiate> PDF (ugly)
  2. PDF*dx
    ###proof

    ## joint PDF

e.g. order statistics of Uniforms

story:beta-Binomial Conjugacy

Screen Shot 2020-03-03 at 16.07.50

Mean vs Bayes'


deduction

e.g. 拉普拉斯问题

来自大名鼎鼎的拉普拉斯的问题,若给定太阳每天都升起的历史记录,则太阳明天仍然能升起的概率是多少?

拉普拉斯自己的解法:
假定太阳升起这一事件服从一个未知参数A的伯努利过程,且A是[0,1]内均匀分布,则利用已给定的历史数据,太阳明天能升起这一事件的后验概率为
\(P(Xn+1|Xn=1,Xn-1=1,...,X1=1)=\frac{P(Xn+1,Xn=1,Xn-1=1,...,X1=1)}{P(Xn=1,Xn-1=1,...,X1=1)}\)=An+1 在[0,1]内对A的积分/An 在[0,1]内对A的积分=\(\frac{n+1}{n+2}\),即已知太阳从第1天到第n天都能升起,第n+1天能升起的概率接近于1.

Monte carlo

importance sampling

reduce the 方差

importance sampling

example

What does Multi-Armed Bandit means?

credit:https://iosband.github.io/2015/07/19/Efficient-experimentation-and-multi-armed-bandits.html

At first, multi-armed bandit means using
\(f^* : \mathcal{X} \rightarrow \mathbb{R}\)

  1. Each arm \(i\) pays out 1 dollar with probability \(p_i\) if it is played; otherwise it pays out nothing.
  2. While the \(p_1,…,p_k\) are fixed, we don’t know any of their values.
  3. Each timestep \(t\) we pick a single arm \(a_t\) to play.
  4. Based on our choice, we receive a return of \(r_t \sim Ber(p_{a_t})\).
  5. ##How should we choose arms so as to maximize total expected return?##