View on GitHub

memo

Online Learning and Online Convex Optimization

1. Introduction

$\mathcal{X}$,
- instance domain
- usually $\mathcal{X} := \mathbb{R}^{n}$,
- all possible inputs
$\mathcal{Y}$,
- target domain
- answer/feedback/reward
$D$,
- prediction domain
- sometimes $\mathcal{Y} \subseteq D$,
$x_{t} \in \mathcal{X}$,
- a question/input at round $t$
$p_{t} \in \mathcal{D}$,
- our prediction at round $t$ based on the inputs $x_{1}, \ldots, x_{t}$ and observations $y_{1}, \ldots, y_{t}$,
$y_{t}$,
- answer at round $t$,
- observed after predicting at $t$,
$l:\mathcal{D} \times \mathcal{Y} \rightarrow \mathbb{R}$,
- loss function
- $l(p_{t}, y_{t})$ is loss from the prediction at round $t$,

With these notation, online leaning problem is conceptually stated as below.

Online Learning

Minimize the sum of the losses

\[\sum_{t=1}^{T} l(p_{t}, y_{t})\]

through the following steps.

for $t = 1, 2, \ldots, T$,
- recieve question $x_{t} \in \mathcal{X}$
- Predict $p_{t} \in D$
- Recieve true answer $y_{t} \in \mathcal{Y}$
- suffer loss $l(p_{t}, y_{t})$

■

$\mathcal{H}$,
- subset of functions from $\mathcal{X}$ to $\mathcal{Y}$,
- called hypothesis class
- known to leaner
$h^{*}: \mathcal{X} \rightarrow \mathcal{Y}$,

Definition. mistake-bound

$A$,
- an online learning algorithm
$p_{t, A}$,
- predictions based on the algorithm $A$,

\[M_{A}(\mathcal{H}) := \max_{h \in \mathcal{H}} \mathrm{card}( t \in \{1, \ldots, T\} \mid h(x_{t}) \neq p_{t, A} ) .\]

■

Definition. regret

$h^{*} \in \mathcal{H}$,
- the true answer

Regreet of

\[\begin{equation} \mathrm{Regret}_{T}(h^{*}) := \sum_{t=1}^{T} l(p_{t}, y_{t}) - \sum_{t=1}^{T} l(h^{*}(x_{t}), y_{t}) \label{equation_01_01} \end{equation} .\]

Regret of the algorithm relative to a hypothesis class $\mathcal{H}$ is

\[\begin{equation} \mathrm{Regret}_{T}(\mathcal{H}) := \max_{h \in \mathcal{H}} \mathrm{Regret}_{T}(h) \label{equation_01_02} \end{equation} .\]

■

The leaner’s goal is having the lowest possible regret relative to

Remark

$\mathcal{H}$ is called Finite Hypothesis Class if $| \mathcal{H} | < \infty$,
- The discrete problem has usually finite hypothesis class. This class is not

■

Assumption. Relizability

Online Learning Problem is said to be satisfy relizability condition if

\[\exists h \in \mathcal{H} \text{ s.t. } y_{t} = h(x_{t}) .\]

The answer is generated by some function in the hypothesis class $\mathcal{H}$.

■

2. Online Convex Optimization

Online Convex Optimization

$S$,
- convex set

Online Convex Optimization (OCO)

for $t = 1, 2, \ldots, T$
- predict a vector $w_{t} \in S$
- receive a convex loss function $f_{t}:S \rightarrow \mathbb{R}$
- suffer loss $f_{t}(w_{t})$

Regrets of online convex optimization

\[u \in S, \ \mathrm{Regret}_{T}(u) := \sum_{t=1}^{T} f_{t}(w_{t}) - \sum_{t=1}^{T} f_{t}(u) .\]

Regrets of online convex optimization relative to $U$

\[u \in U, \ \mathrm{Regret}_{T}(U) := \max_{u \in U} \mathrm{Regret}_{T}(u) .\]

■

Remark

In OCO problem,

$\mathcal{Y} = S$,
$\mathcal{X} = \{1\}$,
$x_{t} := w_{t}$,
$p_{t} := w_{t}$,
$y_{t} := f_{t}(w_{t})$,
$\mathcal{H} = U$,

\[l(p_{t}, y_{t}) := L(w_{t}, y_{t}) - L(h(w_{t}), y_{t}) =\]

■

2.1 Convexification

2.1.1 Convexification by Randomization

We consider the follwing problem.

$d \in \mathbb{N}$,
- the number of advices
$p_{t} \in \{1, \ldots, d\}$,
- chosen advice
$y_{t} := (y_{t, 1}, \ldots, y_{t, d}) \in [0, 1]^{d}$,
- where $y_{t, i}$ is the cost/loss of following the advice of the $i$ the advice
$l(p_{t},k y_{t}) := y_{t, p_{t}}$,
- the loss the learner pays at round $t$,

Since this problem is discrete, this is non convex problem.

Now we interpret above problem as OCO problem. Let

\[S := \{ w \in \mathbb{R}^{d} \mid w \ge 0, \|w\|_{1} = 1 \} .\]

$S$ is called probability simplex, which is interpreted as the probablity of choices. We define loss function

\[\begin{eqnarray} f_{t}(w_{t}) & := & \langle w_{t}, y_{t} \rangle . \nonumber \end{eqnarray}\]

Since $\sum_{t=1}^{d}w_{t} = 1$, let $P_{t}: \Omega \rightarrow \{1, \ldots, d\}$ be probablistic choice at $t$, which is r.v. with probability

\[P(P_{t} = i) = w_{t, i} .\]

Then loss at round $t$ is written

\[\begin{eqnarray} \mathrm{E} \left[ y_{t, P_{t}} \right] & = & \sum_{i=1}^{d} y_{t, i} P(P_{t} = i) \\ & = & f_{t}(w_{t}) \nonumber \end{eqnarray}\]

where $y_{t, i}$ is deterministic loss for advice $i$ at round $t$.

for $t = 1, \ldots, T$
- choose $w_{t} \in S$, probability for choosing advices.
- recieve convex function $f_{t}$,
- suffer from loss $f_{t}(w_{t})$,

$U$ is the collection of the unit vectors

\[U := \{ e_{1}, \ldots, e_{d} \} .\]

memo

Online Learning and Online Convex Optimization

Online Learning and Online Convex Optimization

1. Introduction

Online Learning

Definition. mistake-bound

Definition. regret

Remark

Assumption. Relizability

2. Online Convex Optimization

Online Convex Optimization

Remark

2.1 Convexification

2.1.1 Convexification by Randomization

2.1.2 Convexification by Surrogate Loss FunctionsSurrogate Loss Functions

Reference