Introduction to Online Convex Optimization Chapter04
We will introduce new class of convex functions: exp-concave functions. In this case, there is a second-order algorithm.
4.1. Motivation: universal portfolio selection
- \(\{Z_{t}\}\),
- i.i.d.
- $Z_{t} \sim \mathrm{N}(\mu, \sigma)$,
- $l \in \mathbb{R}$,
- initial logarithmic price
We define logarithm of the price $l_{t}$ at $t$ by
\[\begin{eqnarray} l_{0} & := & l \nonumber \\ l_{t + 1} & := & l_{t} + Z_{i} \nonumber \end{eqnarray} .\]Now we consider the following potrfolio selection problem.
- $n \in \mathbb{N}$,
- the number of assets
- $x_{t} \in \Delta_{n}$,
- the ratio of the investment at $t$,
- we investment $x_{t}$ of wealth between $t$ and $t+1$,
- \(\Delta_{n} := \{x \in \mathbb{R}^{n} \mid \sum_{i=1}^{n}x_{n} = 1\}\),
- convex
- $r_{t} \in \mathbb{R}_{\ge 0}^{n}$,
- $r_{t, i}$ is the price ratio for the $i$-th asset
- $W_{t}$,
- total wealth at iteration $t$,
Our goal is to maximize the overall wealth gain
\[\log \frac{ W_{T} }{ W_{1} } = \sum_{t=1}^{T} \log r_{t}^{\mathrm{T}} x_{t} .\]Let
\[\begin{eqnarray} \mathcal{F}^{\prime} & := & \{ f: \mathcal{K} \rightarrow \mathbb{R} \mid f(x) := -\log( r^{\mathrm{T}} x ) \quad r \in \mathbb{R}^{n} \} \nonumber \\ \mathcal{K} & := & \delta_{n} \nonumber \end{eqnarray} .\]We define the regret as
\[\begin{eqnarray} \mathrm{Regret}_{T} & := & \max_{x \in \mathcal{K}} \sum_{t=1}^{T} \log r_{t}^{\mathrm{T}} x - \sum_{t=1}^{T} \log r_{t}^{\mathrm{T}} x_{x} \nonumber \\ & = & \sum_{t=1}^{T} f_{t}(x_{t}) + \max_{x \in \mathcal{K}} - \sum_{t=1}^{T} f_{t}(x) \nonumber \\ & = & \sum_{t=1}^{T} f_{t}(x_{t}) - \min_{x \in \mathcal{K}} \sum_{t=1}^{T} f_{t}(x^{*}) \nonumber . \end{eqnarray}\]We can use online gradient descent algorithm which ensures $O(\sqrt{T})$ regret.
4.1.3 Constant rebalancing portfolios
4.2 Exp-concave functions
Proposition propoety
(1)
\[f \in \mathcal{F}, \ \nabla^{2} f(x) = \frac{ r r^{\mathrm{T}} }{ (r^{\mathrm{T}}x)^{2} } .\](2)
$f \in \mathcal{F}$ is not strongly convex.
proof
TODO
Hence Hessian matrix of $f$ is rank one matrix.
Definition 4.1
- $\alpha > 0$,
- $\mathcal{K} \subseteq \mathbb{R}^{n}$,
- $f: \mathcal{K} \rightarrow \mathbb{R}$,
$f$ is said to be $\alpha$-exp-cncave over $\mathcal{K}$ if
\[g(x) := e^{-\alpha f(x)}\]is concave function.
Remark
\[f(x) = - \frac{1}{\alpha} \log g(x) .\]Lemma 4.1
- $\alpha > 0$,
- $f: \mathcal{K} \rightarrow \mathbb{R}$,
- twice-differentiable
Then the following statements are equivalent
- (1) $f$ is $\alpha$-exp-concave
- (2)
proof
Lemma 4.2
- $D, G > 0$,
- $\mathcal{K}$,
- $D$-bounded
- convex
- $f: \mathcal{K} \rightarrow \mathbb{R}$,
- $\alpha$-exp-concave function
- gradients is bounded by $G$,
- twice-differentialble
proof
By proposition, $\nabla f(x) \nabla f(x)^{\mathrm{T}}$ is positive definite. Since $2\gamma \le \alpha$, by proposition, we obtain.
\[2 \gamma \nabla f(x) \nabla f(x)^{\mathrm{T}} \preccurlyeq \nabla^{2}f(x) .\]Hence by lemma 4.2 $f$ is also $2\gamma$-exp-concave funciton. Let $h(x) := \exp(-2\gamma f(x)$. We can easily show that
\[\begin{eqnarray} \nabla h(x) & = & -2\gamma \exp(- 2\gamma f(x)) \nabla f(x) . \nonumber \end{eqnarray}\]By concavity of $h$,
\[\begin{eqnarray} & & h(x) \le h(y) + \nabla h(y)^{\mathrm{T}} (x - y) \nonumber \\ & \Leftrightarrow & \exp(-2\gamma f(x)) \le \exp(-2\gamma f(y)) + -2\gamma \exp(- 2\gamma f(y)) \nabla f(y)^{\mathrm{T}} (x - y) \nonumber \\ & \Leftrightarrow & \exp(-2\gamma f(x)) \le \exp(-2\gamma f(y)) \left( 1 - 2\gamma \nabla f(y)^{\mathrm{T}} (x - y) \right) \nonumber \\ & \Leftrightarrow & -2\gamma f(x) \le -2\gamma f(y) + \log \left( 1 - 2\gamma \nabla f(y)^{\mathrm{T}} (x - y) \right) \nonumber \\ & \Leftrightarrow & f(x) \ge f(y) - \frac{1}{2 \gamma} \log \left( 1 - 2\gamma \nabla f(y)^{\mathrm{T}} (x - y) \right) \end{eqnarray} .\]Note that
\[\begin{eqnarray} \abs{ 2\gamma \nabla f(y)^{\mathrm{T}} (x - y) } & \le & 2\gamma GD & \le & \frac{1}{4} . \end{eqnarray}\]It’s easy to confirm that
\[\begin{eqnarray} \forall z \in [-\frac{1}{4}, \frac{1}{4}], \ & & -\log(1 - z) \le z + \frac{1}{4} z^{2} \nonumber \\ & \Leftrightarrow & \exp(1 - z) \ge - \frac{1}{4} \left( z + 2 \right)^{2} + 1 \nonumber \end{eqnarray}\]Applying the inequality for $z = 2\gamma \nabla f(y)^{\mathrm{T}}(x - y)$,
\[\begin{eqnarray} & \Leftrightarrow & f(x) \ge f(y) - \frac{1}{2 \gamma} \log \left( 1 - 2\gamma \nabla f(y)^{\mathrm{T}} (x - y) \right) \nonumber \\ & \Leftrightarrow & f(x) \ge f(y) + \frac{1}{2 \gamma} \left( 2\gamma \nabla f(y)^{\mathrm{T}} (x - y) + \gamma^{2} (x - y)^{\mathrm{T}} \nabla f(y) \nabla f(y)^{\mathrm{T}} (x - y) \right) \nonumber \\ & \Leftrightarrow & f(x) \ge f(y) + \nabla f(y)^{\mathrm{T}} (x - y) + \frac{\gamma}{2} (x - y)^{\mathrm{T}} \nabla f(y) \nabla f(y)^{\mathrm{T}} (x - y) \nonumber \end{eqnarray}\]4.3 The online Newton step algorithm
The online Newton algorithm is actually quasi-Newton algorithm for online problems.
Algorithm 9 online Newton stpe
- $f:\mathcal{K} \rightarrow \mathbb{R}$,
- function
- $T \in \mathbb{N}$,
- $x_{1} \in \mathcal{K}$,
- $\tilde{\alpha} > 0$,
- parameter
- $\gamma > 0$,
- $\epsilon > 0$,
- $A_{0} := \epsilon E_{n}$,
- where $E_{n}$ is $n$ unit matrix
Step1. for $t=1$ to $T$
Step2. Choose $x_{t} \in \mathcal{K}$ and obverse cost $f_{t} \in \mathcal{F}$,
Step3. Rank-1 Update
\[A_{t} := A_{t - 1} + \nabla f(x_{t}) \nabla f(x_{t})^{\mathrm{T}} .\]Step4. Newton step and projections
\[\begin{eqnarray} y_{t + 1} & := & x_{t} - \frac{1}{\gamma} A_{t}^{-1} \nabla f(x_{t}) \nonumber \\ x_{t + 1} & := & \Pi_{\mathcal{K}, A_{t}} (y_{t + 1}) \end{eqnarray}\]where $\Pi_{\mathcal{K}, A_{t}}$ is projection onto $\mathcal{K}$ with the norm defined by $A_{t}$.
Step5. end for
Step6. Return $x_{T+1}$.
Remark
By proposition, $\nabla f(x) \nabla f(x)^{\mathrm{T}}$ is positive definite. Obviously, $E_{n}$ is positive definite. It is easy confirm that the sum of the positive definite is also positive definite. Hence $A_{t}$ is positive definite for all $t$.
\[\begin{eqnarray} \Pi_{\mathcal{K}, A_{t}} (y_{t + 1}) & := & \arg \min_{x \in \mathcal{K}} \|y_{t + 1} - x\|_{A_{t}}^{2} \nonumber \\ & = & \arg \min_{x \in \mathcal{K}} (y_{t + 1} - x)^{\mathrm{T}} A_{t} (y_{t + 1} - x) \nonumber \end{eqnarray} .\]Since $\nabla f(x) \nabla f(x)^{\mathrm{T}}$ is symmetric, $A_{t}$ is symmetric for all $t$. Hence
\[(A_{t}^{-1})^{\mathrm{T}} = A_{t}^{-1} .\]Theorem 4.3
- $\gamma :=\frac{1}{2}\min(\frac{1}{4GD}, \alpha)$,
- $\epsilon := \frac{1}{\gamma^{2} D^{2}}$,
- $T > 4$,
proof
We first show the upper bound of \(\sum_{t=1}^{T}f(x_{t})^{\mathrm{T}}A_{t}^{-1}\nabla f(x_{t})\),
\[\begin{eqnarray} \nabla f(x_{t})^{\mathrm{T}} A_{t}^{-1} \nabla f(x_{t}) & = & A_{t}^{-1} \bullet \nabla f(x_{t}) \nabla f(x_{t})^{\mathrm{T}} \nonumber \\ & = & A_{t}^{-1} \bullet \left( A_{t} - A_{t - 1} \right) \nonumber \\ & = & A_{t}^{-1} \bullet \nabla f(x_{t}) \nabla f(x_{t})^{\mathrm{T}} \nonumber \end{eqnarray}\] \[\begin{eqnarray} \sum_{t=1}^{T} \nabla f(x_{t})^{\mathrm{T}} A_{t}^{-1} \nabla f(x_{t}) & = & \sum_{t=1}^{T} A_{t}^{-1} \bullet \left( A_{t} - A_{t-1} \right) \nonumber \\ & = & \sum_{t=1}^{T} A_{t}^{-1} \bullet \left( A_{t} - A_{t-1} \right) \nonumber \\ & \le & \sum_{t=1}^{T} \log \frac{ \mathrm{det}(A_{t}) }{ \mathrm{det}(A_{t-1}) } \quad (\because \text{lemma 4.5}) \nonumber \\ & = & \log \frac{ \mathrm{det}(A_{T}) }{ \mathrm{det}(A_{0}) } \nonumber \\ & = & \log \frac{ \mathrm{det}( \sum_{t=1}^{T} \nabla f(x_{t}) \nabla f(x_{t})^{\mathrm{T}} + \epsilon E_{n} ) }{ \mathrm{det}(A_{0}) } \end{eqnarray}\]By the Gershgorin’s circle theorem, there exists \(j \in \{1, \ldots, n\}\) such that
\[\begin{eqnarray} \lambda_{1}( \nabla f(x_{t}) \nabla f(x_{t})^{\mathrm{T}} + \epsilon ) & \le & \sum_{i \neq j} | (\nabla f(x_{t}))_{i} (\nabla f(x_{t}))_{j} | + | (\nabla f(x_{t}))_{j} (\nabla f(x_{t}))_{j} + \epsilon | \nonumber \\ & \le & \sum_{i \neq j} | (\nabla f(x_{t}))_{i} (\nabla f(x_{t}))_{j} | + | (\nabla f(x_{t}))_{j} (\nabla f(x_{t}))_{j} | + | \epsilon | \nonumber \\ & = & \sum_{i = 1}^{n} | (\nabla f(x_{t}))_{i} (\nabla f(x_{t}))_{j} | + | \epsilon | \nonumber \\ & \le & \| (\nabla f(x_{t}))_{j} \|^{2} \nonumber \\ & \le & G^{2} + \epsilon \end{eqnarray}\]TODO.
\[\begin{eqnarray} |A_{T}| & & (TG^{2} + \epsilon) \nonumber \end{eqnarray}\]Thus,
\[\begin{eqnarray} \sum_{t=1}^{T} \nabla f(x_{t})^{\mathrm{T}} A_{t}^{-1} \nabla f(x_{t}) & \le & \log \frac{ \mathrm{det}( \sum_{t=1}^{T} \nabla f(x_{t}) \nabla f(x_{t})^{\mathrm{T}} + \epsilon E_{n} ) }{ \mathrm{det}(A_{0}) } \nonumber \\ & \le & \log \frac{ (TG^{2} + \epsilon)^{n} }{ \epsilon^{n} } \nonumber \\ & = & n \log \left( \frac{ TG^{2} }{ \epsilon } + 1 \right) \nonumber \\ & = & n \log \left( T G^{2} \gamma D^{2} + 1 \right) \nonumber \\ & \le & n \log \left( T G^{2} \gamma D^{2} + 1 \right) \end{eqnarray}\]Lemma 4.4
- $\gamma :=\frac{1}{2}\min(\frac{1}{4GD}, \alpha)$,
- $\epsilon := \frac{1}{\gamma^{2} D^{2}}$,
- $T > 4$,
In algorithm 9,
\[\begin{eqnarray} \mathrm{Regret}_{T} \le 4 \left( \frac{1}{\alpha} + GD \right) \left( \sum_{t=1}^{T} \nabla f(x_{t})^{\mathrm{T}} A_{t}^{-1} \nabla f(x_{t}) + 1 \right) . \end{eqnarray}\]proof
Let
\[x^{*} := \arg \min_{x \in \mathcal{K}} \sum_{t=1}^{T} f_{t}(x) .\]By lemma 4.2, we have
\[\begin{eqnarray} f_{t}(x_{t}) - f_{t}(x^{*}) & \le & R_{t} \nonumber \\ R_{t} & := & \nabla f(x_{t})^{\mathrm{T}} (x_{t} - x^{*}) - \frac{\gamma}{2} (x^{*} - x_{t})^{\mathrm{T}} \nabla f(x_{t}) \nabla f(x_{t})^{\mathrm{T}} (x^{*} - x_{t}) . \nonumber \end{eqnarray}\]By definition of $y_{t+1}$,
\[\begin{eqnarray} & & y_{t+1} - x^{*} = x_{t} - x^{*} - \frac{1}{\gamma} A_{t}^{-1} \nabla f(x_{t}) \label{equation_04_01} \\ & \Leftrightarrow & A_{t} (y_{t+1} - x^{*}) = A_{t}( x_{t} - x^{*} ) - \frac{1}{\gamma} \nabla f(x_{t}) \label{equation_04_02} \end{eqnarray}\]Multiplying the transpose of \(\eqref{equation_04_01}\) by \(\eqref{equation_04_02}\),
\[\begin{eqnarray} (y_{t+1} - x^{*})^{\mathrm{T}} A_{t} (y_{t+1} - x^{*}) & = & ( x_{t} - x^{*} )^{\mathrm{T}} A_{t}( x_{t} - x^{*} ) - \frac{1}{\gamma} \left( A_{t}^{-1} \nabla f(x_{t}) \right)^{\mathrm{T}} A_{t}( x_{t} - x^{*} ) - (x_{t} - x^{*})^{\mathrm{T}} \frac{1}{\gamma} \nabla f(x_{t}) + \frac{1}{\gamma} \left( A_{t}^{-1} \nabla f(x_{t}) \right)^{\mathrm{T}} \frac{1}{\gamma} \nabla f(x_{t}) \nonumber \\ & = & ( x_{t} - x^{*} )^{\mathrm{T}} A_{t}( x_{t} - x^{*} ) - \frac{2}{\gamma} \nabla f(x_{t})^{\mathrm{T}} ( x_{t} - x^{*} ) + \frac{1}{\gamma^{2}} \nabla f(x_{t})^{\mathrm{T}} A_{t}^{-1} \nabla f(x_{t}) \label{equation_04_03} . \end{eqnarray}\]Thus,
\[\begin{eqnarray} (y_{t+1} - x^{*})^{\mathrm{T}} A_{t} (y_{t+1} - x^{*}) & = & \| y_{t+1} - x^{*} \|_{A_{t}}^{2} \nonumber \\ & \ge & \| x_{t+1} - x^{*} \|_{A_{t}}^{2} \quad (\because \text{theorem 2.1}) \nonumber \\ & = & (x_{t+1} - x^{*})^{\mathrm{T}} A_{t} (x_{t+1} - x^{*}) \nonumber \end{eqnarray}\]From \(\eqref{equation_04_03}\),
\[\begin{eqnarray} & & ( x_{t} - x^{*} )^{\mathrm{T}} A_{t}( x_{t} - x^{*} ) - \frac{2}{\gamma} \nabla f(x_{t})^{\mathrm{T}} ( x_{t} - x^{*} ) + \frac{1}{\gamma^{2}} \nabla f(x_{t})^{\mathrm{T}} A_{t}^{-1} \nabla f(x_{t}) \ge (x_{t+1} - x^{*})^{\mathrm{T}} A_{t} (x_{t+1} - x^{*}) \nonumber \\ & \Leftrightarrow & \frac{\gamma}{2} ( x_{t} - x^{*} )^{\mathrm{T}} A_{t}( x_{t} - x^{*} ) - \frac{\gamma}{2} (x_{t+1} - x^{*})^{\mathrm{T}} A_{t} (x_{t+1} - x^{*}) + \frac{1}{\gamma 2} \nabla f(x_{t})^{\mathrm{T}} A_{t}^{-1} \nabla f(x_{t}) \ge \nabla f(x_{t})^{\mathrm{T}} ( x_{t} - x^{*} ) \end{eqnarray}\]Summing up over $t=1$ to $T$,
\[\begin{eqnarray} \sum_{t=1}^{T} \nabla f(x_{t})^{\mathrm{T}} ( x_{t} - x^{*} ) & \le & \sum_{t=1}^{T} \frac{\gamma}{2} ( x_{t} - x^{*} )^{\mathrm{T}} A_{t}( x_{t} - x^{*} ) - \sum_{t=1}^{T} \frac{\gamma}{2} (x_{t+1} - x^{*})^{\mathrm{T}} A_{t} (x_{t+1} - x^{*}) + \sum_{t=1}^{T} \frac{1}{\gamma 2} \nabla f(x_{t})^{\mathrm{T}} A_{t}^{-1} \nabla f(x_{t}) \nonumber \\ & = & \sum_{t=}^{T} \frac{\gamma}{2} ( x_{t} - x^{*} )^{\mathrm{T}} A_{t}( x_{t} - x^{*} ) - \sum_{t=2}^{T} \frac{\gamma}{2} (x_{t} - x^{*})^{\mathrm{T}} A_{t-1} (x_{t} - x^{*}) - \frac{\gamma}{2} (x_{T+1} - x^{*})^{\mathrm{T}} A_{T} (x_{T+1} - x^{*}) + \sum_{t=1}^{T} \frac{1}{\gamma 2} \nabla f(x_{t})^{\mathrm{T}} A_{t}^{-1} \nabla f(x_{t}) \nonumber \\ & = & \sum_{t=2}^{T} \frac{\gamma}{2} ( x_{t} - x^{*} )^{\mathrm{T}} \left( A_{t} - A_{t-1} \right) ( x_{t} - x^{*} ) + \frac{\gamma}{2} (x_{t} - x^{*})^{\mathrm{T}} A_{1} ( x_{t} - x^{*}) - \frac{\gamma}{2} (x_{T+1} - x^{*})^{\mathrm{T}} A_{T} (x_{T+1} - x^{*}) + \sum_{t=1}^{T} \frac{1}{\gamma 2} \nabla f(x_{t})^{\mathrm{T}} A_{t}^{-1} \nabla f(x_{t}) \nonumber \\ & = & \sum_{t=1}^{T} \frac{\gamma}{2} ( x_{t} - x^{*} )^{\mathrm{T}} \nabla f(x_{t}) \nabla f(x_{t})^{\mathrm{T}} ( x_{t} - x^{*} ) + \frac{\gamma}{2} (x_{t} - x^{*})^{\mathrm{T}} \left( A_{1} - \nabla f(x_{t}) \nabla f(x_{t})^{\mathrm{T}} \right) ( x_{t} - x^{*}) - \frac{\gamma}{2} (x_{T+1} - x^{*})^{\mathrm{T}} A_{T} (x_{T+1} - x^{*}) + \sum_{t=1}^{T} \frac{1}{\gamma 2} \nabla f(x_{t})^{\mathrm{T}} A_{t}^{-1} \nabla f(x_{t}) \nonumber \\ & \le & \sum_{t=1}^{T} \frac{\gamma}{2} ( x_{t} - x^{*} )^{\mathrm{T}} \nabla f(x_{t}) \nabla f(x_{t})^{\mathrm{T}} ( x_{t} - x^{*} ) + \frac{\gamma}{2} (x_{t} - x^{*})^{\mathrm{T}} \left( A_{1} - \nabla f(x_{t}) \nabla f(x_{t})^{\mathrm{T}} \right) ( x_{t} - x^{*}) + \sum_{t=1}^{T} \frac{1}{\gamma 2} \nabla f(x_{t})^{\mathrm{T}} A_{t}^{-1} \nabla f(x_{t}) \quad (\because \text{positive definiteness}) \end{eqnarray}\]Thus,
\[\begin{eqnarray} \sum_{t=1}^{T} R_{t} & \le & \frac{\gamma}{2} (x_{t} - x^{*})^{\mathrm{T}} \left( A_{1} - \nabla f(x_{t}) \nabla f(x_{t})^{\mathrm{T}} \right) ( x_{t} - x^{*}) + \sum_{t=1}^{T} \frac{1}{\gamma 2} \nabla f(x_{t})^{\mathrm{T}} A_{t}^{-1} \nabla f(x_{t}) \nonumber \\ & = & \frac{\gamma}{2} \epsilon (x_{t} - x^{*})^{\mathrm{T}} ( x_{t} - x^{*}) + \sum_{t=1}^{T} \frac{1}{\gamma 2} \nabla f(x_{t})^{\mathrm{T}} A_{t}^{-1} \nabla f(x_{t}) \quad (\because A_{1} - \nabla f(x_{t}) \nabla f(x_{t})^{\mathrm{T}} = \epsilon E_{n}) \nonumber \\ & = & \frac{1}{2D^{2} \gamma} (x_{t} - x^{*})^{\mathrm{T}} ( x_{t} - x^{*}) + \sum_{t=1}^{T} \frac{1}{\gamma 2} \nabla f(x_{t})^{\mathrm{T}} A_{t}^{-1} \nabla f(x_{t}) \nonumber \\ & \le & \frac{1}{2\gamma} + \sum_{t=1}^{T} \frac{1}{\gamma 2} \nabla f(x_{t})^{\mathrm{T}} A_{t}^{-1} \nabla f(x_{t}) . \nonumber \end{eqnarray}\]Since $\gamma := \frac{1}{2}\min(\frac{1}{4GD}, \alpha),
\[\frac{1}{\gamma} \le 2 \left( \frac{1}{\alpha} + 4GD \right) \le 8 \left( \frac{1}{\alpha} + GD \right) .\]This gives the lemma.
Lemma 4.5
- $A \in \mathbb{R}^{n \times n}$,
- symmetric
- $B \in \mathbb{R}^{n \times n}$,
- symmetric
- $0 \preccurlyeq B \preccurlyeq A$,
- $A$ is positive definite
- $B$ is positive definte
- $A - B$ is positive definite
proof
\[\begin{eqnarray} A^{-1} \bullet (A - B) & = & \mathrm{tr} \left( A^{-1}(A - B)^{\mathrm{T}} \right) \nonumber \\ & = & \mathrm{tr} \left( A^{-1}(A - B) \right) \nonumber \\ & = & \mathrm{tr} \left( (A^{1/2}A^{1/2})^{-1}(A - B) \right) \nonumber \\ & = & \mathrm{tr} \left( A^{-1/2} (A - B) A^{-1/2} \right) \quad (\because A^{-1/2} \text{ is symmetric}) \nonumber \\ & = & \sum_{i=1}^{n} \left( \lambda_{i} \left( E_{n} - A^{-1/2} B A^{-1/2} \right) \right) \nonumber \\ & = & \sum_{i=1}^{n} \left( 1 - \lambda_{i} \left( A^{-1/2} B A^{-1/2} \right) \right) \nonumber \\ & = & \sum_{i=1}^{n} \left( 1 - \lambda_{i} \left( A^{-1/2} B A^{-1/2} \right) \right) \nonumber \\ & = & - \sum_{i=1}^{n} \log \left( \lambda_{i} \left( A^{-1/2} B A^{-1/2} \right) \right) \quad (\because 1 - x \le -\log(x)) \nonumber \\ & = & - \log \left( \prod_{i=1}^{n} \lambda_{i} \left( A^{-1/2} B A^{-1/2} \right) \right) \nonumber \\ & = & - \log \left( \mathrm{det} \left( A^{-1/2} B A^{-1/2} \right) \right) \nonumber \\ & = & - \log \left( \mathrm{det} \left( A^{-1} \right) \mathrm{det} \left( B \right) \right) \quad (\because A \text{ is PSD}). \nonumber \\ & = & \log \left( \frac{ \mathrm{det} \left( A \right) }{ \mathrm{det} \left( B \right) } \right) \nonumber \end{eqnarray}\]Remark
If symmetric matrix $A$ is diagonalizable, $A$ is decomposed to
\[A = V D V^{-1}\]where $V$ is orthogonal matrix and $D$ is diagonal matrix. In this case,
\[\begin{eqnarray} A^{1/2} & = & V D^{1/2} V^{-1} \nonumber \\ A^{-1/2} & = & V D^{-1/2} V^{-1} \nonumber \\ A^{-1} & = & V D^{-1} V^{-1} \nonumber \end{eqnarray} .\]Note that $A^{1/2}$, $A^{-1/2}$, $A^{-1}$ are all symmetric matrix.
Implementation and running time
- Storage complexity
- $O(n^{2})$,
- we have to store $A_{t}$ in each iteration
- computational complexity
- calculation of inversion of matrix takes $O(n^{2})$,
- we do not have to explicitly calculate inverse of matrix because of woodbury formula
- computation of projection onto $\mathcal{K}$ takes polynomial time
- calculation of inversion of matrix takes $O(n^{2})$,
By the woodbury formula, we have
\[\begin{equation} (A + xx^{\mathrm{T}})^{-1} = A^{-1} - \frac{ A^{-1}xx^{\mathrm{T}}A^{-1} }{ 1 + x^{\mathrm{T}}A^{-1}u } \end{equation} .\]The inverse of the $A_{t}$ is
\[\begin{eqnarray} A_{0}^{-1} & = & \frac{1}{\epsilon} E_{n} \nonumber \\ A_{t + 1}^{-1} & = & ( A_{t} + \nabla_{t} \nabla_{t}^{\mathrm{T}} )^{-1} \nonumber \\ A_{t}^{-1} - \frac{ A_{t}^{-1} \nabla_{t} \nabla_{t}^{\mathrm{T}} A_{t}^{-1} }{ 1 + \nabla_{t}^{\mathrm{T}} (A_{t})^{-1} \nabla_{t} } \end{eqnarray}\]