View on GitHub

memo

Online Learning and Online Convex Optimization

Online Learning and Online Convex Optimization

1. Introduction

With these notation, online leaning problem is conceptually stated as below.

Online Learning

Minimize the sum of the losses

\[\sum_{t=1}^{T} l(p_{t}, y_{t})\]

through the following steps.

Definition. mistake-bound

\[M_{A}(\mathcal{H}) := \max_{h \in \mathcal{H}} \mathrm{card}( t \in \{1, \ldots, T\} \mid h(x_{t}) \neq p_{t, A} ) .\]

Definition. regret

Regreet of

\[\begin{equation} \mathrm{Regret}_{T}(h^{*}) := \sum_{t=1}^{T} l(p_{t}, y_{t}) - \sum_{t=1}^{T} l(h^{*}(x_{t}), y_{t}) \label{equation_01_01} \end{equation} .\]

Regret of the algorithm relative to a hypothesis class $\mathcal{H}$ is

\[\begin{equation} \mathrm{Regret}_{T}(\mathcal{H}) := \max_{h \in \mathcal{H}} \mathrm{Regret}_{T}(h) \label{equation_01_02} \end{equation} .\]

The leaner’s goal is having the lowest possible regret relative to

Remark

Assumption. Relizability

Online Learning Problem is said to be satisfy relizability condition if

\[\exists h \in \mathcal{H} \text{ s.t. } y_{t} = h(x_{t}) .\]

The answer is generated by some function in the hypothesis class $\mathcal{H}$.

2. Online Convex Optimization

Online Convex Optimization

Online Convex Optimization (OCO)

Regrets of online convex optimization

\[u \in S, \ \mathrm{Regret}_{T}(u) := \sum_{t=1}^{T} f_{t}(w_{t}) - \sum_{t=1}^{T} f_{t}(u) .\]

Regrets of online convex optimization relative to $U$

\[u \in U, \ \mathrm{Regret}_{T}(U) := \max_{u \in U} \mathrm{Regret}_{T}(u) .\]

Remark

In OCO problem,

\[l(p_{t}, y_{t}) := L(w_{t}, y_{t}) - L(h(w_{t}), y_{t}) =\]

2.1 Convexification

2.1.1 Convexification by Randomization

We consider the follwing problem.

Since this problem is discrete, this is non convex problem.

Now we interpret above problem as OCO problem. Let

\[S := \{ w \in \mathbb{R}^{d} \mid w \ge 0, \|w\|_{1} = 1 \} .\]

$S$ is called probability simplex, which is interpreted as the probablity of choices. We define loss function

\[\begin{eqnarray} f_{t}(w_{t}) & := & \langle w_{t}, y_{t} \rangle . \nonumber \end{eqnarray}\]

Since $\sum_{t=1}^{d}w_{t} = 1$, let \(P_{t}: \Omega \rightarrow \{1, \ldots, d\}\) be probablistic choice at $t$, which is r.v. with probability

\[P(P_{t} = i) = w_{t, i} .\]

Then loss at round $t$ is written

\[\begin{eqnarray} \mathrm{E} \left[ y_{t, P_{t}} \right] & = & \sum_{i=1}^{d} y_{t, i} P(P_{t} = i) \\ & = & f_{t}(w_{t}) \nonumber \end{eqnarray}\]

where $y_{t, i}$ is deterministic loss for advice $i$ at round $t$.

$U$ is the collection of the unit vectors

\[U := \{ e_{1}, \ldots, e_{d} \} .\]

2.1.2 Convexification by Surrogate Loss FunctionsSurrogate Loss Functions

Reference