Online Learning and Online Convex Optimization
1. Introduction
- $\mathcal{X}$,
- instance domain
- usually $\mathcal{X} := \mathbb{R}^{n}$,
- all possible inputs
- $\mathcal{Y}$,
- target domain
- answer/feedback/reward
- $D$,
- prediction domain
- sometimes $\mathcal{Y} \subseteq D$,
- $x_{t} \in \mathcal{X}$,
- a question/input at round $t$
- $p_{t} \in \mathcal{D}$,
- our prediction at round $t$ based on the inputs $x_{1}, \ldots, x_{t}$ and observations $y_{1}, \ldots, y_{t}$,
- $y_{t}$,
- answer at round $t$,
- observed after predicting at $t$,
- $l:\mathcal{D} \times \mathcal{Y} \rightarrow \mathbb{R}$,
- loss function
- $l(p_{t}, y_{t})$ is loss from the prediction at round $t$,
With these notation, online leaning problem is conceptually stated as below.
Online Learning
Minimize the sum of the losses
\[\sum_{t=1}^{T} l(p_{t}, y_{t})\]through the following steps.
- for $t = 1, 2, \ldots, T$,
- recieve question $x_{t} \in \mathcal{X}$
- Predict $p_{t} \in D$
- Recieve true answer $y_{t} \in \mathcal{Y}$
- suffer loss $l(p_{t}, y_{t})$
- $\mathcal{H}$,
- subset of functions from $\mathcal{X}$ to $\mathcal{Y}$,
- called hypothesis class
- known to leaner
- $h^{*}: \mathcal{X} \rightarrow \mathcal{Y}$,
Definition. mistake-bound
- $A$,
- an online learning algorithm
- $p_{t, A}$,
- predictions based on the algorithm $A$,
Definition. regret
- $h^{*} \in \mathcal{H}$,
- the true answer
Regreet of
\[\begin{equation} \mathrm{Regret}_{T}(h^{*}) := \sum_{t=1}^{T} l(p_{t}, y_{t}) - \sum_{t=1}^{T} l(h^{*}(x_{t}), y_{t}) \label{equation_01_01} \end{equation} .\]Regret of the algorithm relative to a hypothesis class $\mathcal{H}$ is
\[\begin{equation} \mathrm{Regret}_{T}(\mathcal{H}) := \max_{h \in \mathcal{H}} \mathrm{Regret}_{T}(h) \label{equation_01_02} \end{equation} .\]The leaner’s goal is having the lowest possible regret relative to
Remark
- $\mathcal{H}$ is called Finite Hypothesis Class if \(|
\mathcal{H}
| < \infty\),
- The discrete problem has usually finite hypothesis class. This class is not
Assumption. Relizability
Online Learning Problem is said to be satisfy relizability condition if
\[\exists h \in \mathcal{H} \text{ s.t. } y_{t} = h(x_{t}) .\]The answer is generated by some function in the hypothesis class $\mathcal{H}$.
2. Online Convex Optimization
Online Convex Optimization
- $S$,
- convex set
Online Convex Optimization (OCO)
- for $t = 1, 2, \ldots, T$
- predict a vector $w_{t} \in S$
- receive a convex loss function $f_{t}:S \rightarrow \mathbb{R}$
- suffer loss $f_{t}(w_{t})$
Regrets of online convex optimization
\[u \in S, \ \mathrm{Regret}_{T}(u) := \sum_{t=1}^{T} f_{t}(w_{t}) - \sum_{t=1}^{T} f_{t}(u) .\]Regrets of online convex optimization relative to $U$
\[u \in U, \ \mathrm{Regret}_{T}(U) := \max_{u \in U} \mathrm{Regret}_{T}(u) .\]Remark
In OCO problem,
- $\mathcal{Y} = S$,
- \(\mathcal{X} = \{1\}\),
- $x_{t} := w_{t}$,
- $p_{t} := w_{t}$,
- $y_{t} := f_{t}(w_{t})$,
- $\mathcal{H} = U$,
2.1 Convexification
2.1.1 Convexification by Randomization
We consider the follwing problem.
- $d \in \mathbb{N}$,
- the number of advices
- \(p_{t} \in \{1, \ldots, d\}\),
- chosen advice
- $y_{t} := (y_{t, 1}, \ldots, y_{t, d}) \in [0, 1]^{d}$,
- where $y_{t, i}$ is the cost/loss of following the advice of the $i$ the advice
- $l(p_{t},k y_{t}) := y_{t, p_{t}}$,
- the loss the learner pays at round $t$,
Since this problem is discrete, this is non convex problem.
Now we interpret above problem as OCO problem. Let
\[S := \{ w \in \mathbb{R}^{d} \mid w \ge 0, \|w\|_{1} = 1 \} .\]$S$ is called probability simplex, which is interpreted as the probablity of choices. We define loss function
\[\begin{eqnarray} f_{t}(w_{t}) & := & \langle w_{t}, y_{t} \rangle . \nonumber \end{eqnarray}\]Since $\sum_{t=1}^{d}w_{t} = 1$, let \(P_{t}: \Omega \rightarrow \{1, \ldots, d\}\) be probablistic choice at $t$, which is r.v. with probability
\[P(P_{t} = i) = w_{t, i} .\]Then loss at round $t$ is written
\[\begin{eqnarray} \mathrm{E} \left[ y_{t, P_{t}} \right] & = & \sum_{i=1}^{d} y_{t, i} P(P_{t} = i) \\ & = & f_{t}(w_{t}) \nonumber \end{eqnarray}\]where $y_{t, i}$ is deterministic loss for advice $i$ at round $t$.
- for $t = 1, \ldots, T$
- choose $w_{t} \in S$, probability for choosing advices.
- recieve convex function $f_{t}$,
- suffer from loss $f_{t}(w_{t})$,
$U$ is the collection of the unit vectors
\[U := \{ e_{1}, \ldots, e_{d} \} .\]