What does Multi-Armed Bandit means?


At first, multi-armed bandit means using
\(f^* : \mathcal{X} \rightarrow \mathbb{R}\)

  1. Each arm \(i\) pays out 1 dollar with probability \(p_i\) if it is played; otherwise it pays out nothing.
  2. While the \(p_1,…,p_k\) are fixed, we don’t know any of their values.
  3. Each timestep \(t\) we pick a single arm \(a_t\) to play.
  4. Based on our choice, we receive a return of \(r_t \sim Ber(p_{a_t})\).
  5. ##How should we choose arms so as to maximize total expected return?##