View on GitHub

memo

Chapter2.

2.2 Fundamental theory

2.2.1 Basic concpets

Definition 6 log loss function

The average loss functon

\[\begin{equation} L(w) := -\int_{\mathcal{R}(X)} \log p(x \mid w) \ P_{X}(dx) \label{equation_02_07} \end{equation}\]

THe emperical loss funnciton are defined as

\[\begin{equation} L_{n}(w) := - \frac{1}{n} \sum_{i=1}^{n} \log p(X_{i} \mid w) . \label{equation_02_08} \end{equation}\]

Definition

Here we define 3 terms:

\[\begin{eqnarray} F_{n}(\beta) & := & -\frac{1}{\beta} \log Z_{n}(\beta; X_{1:n}) \nonumber \\ G & := & - \int_{\mathcal{R}(X)} \log \left( \int_{W} p(x \mid w) \phi(w) \ dw \right) \ dx \nonumber \\ T_{n} & := & - \frac{1}{n} \sum_{i=1}^{n} \log \int_{W} p(X_{i} \mid w) \phi(w) \ dw . \nonumber \end{eqnarray}\]

2.2.2 Normized observables

We assume that the log density ratio function is relatively finite variance. With this assumption, the log density ratio function is independent on choice of $w_{0} \in W_{0}$. Hence we denote the log loss function by $f(x, w)$.

\[\begin{eqnarray} w_{0} \in W_{0}, \ f(x, w) & = & \log \frac{ p(x \mid w_{0}) }{ p(x \mid w) } \nonumber \\ & = & \log \frac{ q(x) }{ p(x \mid w) } \nonumber \end{eqnarray} .\]

With this notation, we can rewrite the equation

\[\begin{equation} p(x \mid w) = p(x \mid w_{0}) e^{-f(x, w)} \label{equaiton_relation_non_logarithmic_likehood} \end{equation} .\]

Definiiton 7

The normalized average log loss funciton are defined as

\[\begin{equation} K(w) := \int_{\mathcal{R}(X)} f(x, w) \ P_{X}(dx) \nonumber \end{equation} .\]

The normalized empirical log loss funcgtions are defined as

\[\begin{equation} K_{n}(w; X_{1:n}) := \frac{1}{n} \sum_{i=1}^{n} f(X_{i}, w) \nonumber \end{equation} .\]

The immediate consequence of the definiton of the normalized logs loss functions is

\[\begin{eqnarray} L(w) & = & \int_{\mathcal{R}(X)} - \log p(x \mid w) + \log p(x \mid w_{0}) - \log p(x \mid w_{0}) \ P_{X}(dx) \nonumber \\ & = & \int_{\mathcal{R}(X)} f(x, w) - \log p(x \mid w_{0}) \ P_{X}(dx) \nonumber \\ & = & L(w_{0}) + K(w) \nonumber \\ L_{n}(w) & = & \frac{1}{n} \sum_{i=1}^{n} \left( - \log p(X_{i} \mid w) + \log p(X_{i} \mid w_{0}) - \log p(X_{i} \mid w_{0}) \right) \nonumber \\ & = & \frac{1}{n} \sum_{i=1}^{n} \left( f(X_{i}, w) - \log p(X_{i} \mid w_{0}) \right) \nonumber \\ & = & L_{n}(w_{0}) + K_{n}(w; X_{1:n}) . \nonumber \end{eqnarray}\] \[\begin{eqnarray} -K(w) & = & - \mathrm{E} \left[ \log \frac{ q(X) }{ p(X \mid w) } \right] \nonumber \\ & = & \mathrm{E} \left[ \log \frac{ p(X \mid w) }{ q(X) } \right] \nonumber \\ & \le & \log \mathrm{E} \left[ \frac{ p(X \mid w) }{ q(X) } \right] \quad (\because \text{Jensen's inequality}) \nonumber \\ & = & \log \left( \int_{\mathcal{R}(X)} p(X \mid w) \ dx \right) \nonumber \\ & = & 0 \nonumber \end{eqnarray}\]

Hence $K(w) \ge 0$. Moreover,

\[K(w) = 0 \Leftrightarrow w \in W_{0} .\]

Indeed, it is easy to confirm

\[w \in W_{0} \Rightarrow K(w) = 0 .\]

Suppose that

\[\begin{eqnarray} P( \log q(X) \neq \log p(X \mid x) ) & = & P( \log q(X) > \log p(X \mid x) ) + P( \log q(X) < \log p(X \mid x) ) \nonumber \\ & > & 0 \nonumber \end{eqnarray} .\]

If $P( \log q(X) > \log p(X \mid x))$,

\[\begin{eqnarray} \mathrm{E} \left[ 1_{\{\log q(X) > \log p(X \mid w) \}} \log \frac{ q(x) }{ p(x \mid w) } + 1_{\{\log q(X) < \log p(X \mid w) \}} \log \frac{ q(x) }{ p(x \mid w) } \right] & \ge & \mathrm{E} \left[ 1_{\{\log q(X) > \log p(X \mid w) \}} \log \frac{ q(x) }{ p(x \mid w) } \right] \nonumber \\ & > & 0 \nonumber \end{eqnarray}\]

On the other hand, If $P( \log q(X) < \log p(X \mid x))$,

\[\begin{eqnarray} \mathrm{E} \left[ 1_{\{\log q(X) > \log p(X \mid w) \}} \log \frac{ q(x) }{ p(x \mid w) } + 1_{\{\log q(X) < \log p(X \mid w) \}} \log \frac{ q(x) }{ p(x \mid w) } \right] & \ge & \mathrm{E} \left[ 1_{\{\log q(X) < \log p(X \mid w) \}} \log \frac{ q(x) }{ p(x \mid w) } \right] \nonumber \\ & < & 0 . \nonumber \end{eqnarray}\]

In both cases, the inequalities violates the assumption.

Definition the normalized marginal likelihood

The normalized marginal likelihood is defined as

\[Z_{n}^{(0)}(\beta; X_{1:n}) := \int_{W} \exp \left( - n \beta K_{n}(w; X_{1:n}) \right) \phi(w) \ dw .\]

With the normalized marginal likellhood, the partition function can be written as follows.

\[\begin{eqnarray} Z_{n}(\beta; X_{1:n}) & = & \int_{W} \phi(w) \prod_{i=1}^{n} p(X_{i} \mid w)^{\beta} \ dw \nonumber \\ & = & \int_{W} \phi(w) \prod_{i=1}^{n} p(X_{i} \mid w_{0})^{\beta} \exp(- f(X_{i}, w)) \ dw \quad (\because \eqref{equaiton_relation_non_logarithmic_likehood}) \nonumber \\ & = & \int_{W} \phi(w) \prod_{i=1}^{n} p(X_{i} \mid w_{0})^{\beta} \exp \left( - \sum_{j=1}^{n} f(X_{j}, w) \right) \ dw \nonumber \\ & = & \int_{W} \phi(w) \prod_{i=1}^{n} p(X_{i} \mid w_{0})^{\beta} \exp \left( - n K_{n}(w; X_{1:n}) \right) \ dw \nonumber \\ & = & Z_{n}^{(0)}(\beta) \prod_{i=1}^{n} p(X_{i} \mid w_{0})^{\beta} . \nonumber \end{eqnarray}\]

The relation between posterior distribution and the normalized marginal likehood is given by

\[\begin{eqnarray} p(w \mid X^{n}) & = & \frac{ 1 }{ Z_{n}(\beta; X_{1:n}) } \phi(w) \prod_{i=1}^{n} p(X_{i} \mid w)^{\beta} \quad (\because \text{by definition of posterior distribution}) \nonumber \\ & = & \frac{ 1 }{ Z_{n}^{(0)}(\beta; X_{1:n}) \prod_{i=1}^{n} p(X_{i} \mid w_{0})^{\beta} } \phi(w) \prod_{i=1}^{n} p(X_{i} \mid w)^{\beta} \nonumber \\ & = & \frac{ \phi(w) }{ Z_{n}^{(0)}(\beta; X_{1:n}) } \prod_{i=1}^{n} \frac{ p(X_{i} \mid w)^{\beta} }{ p(X_{i} \mid w_{0})^{\beta} } \nonumber \\ & = & \frac{ \phi(w) }{ Z_{n}^{(0)}(\beta; X_{1:n}) } \prod_{i=1}^{n} \exp \left( \log \frac{ p(X_{i} \mid w)^{\beta} }{ p(X_{i} \mid w_{0})^{\beta} } \right) \nonumber \\ & = & \frac{ \phi(w) }{ Z_{n}^{(0)}(\beta; X_{1:n}) } \exp \left( - \sum_{i=1}^{n} \log \frac{ p(X_{i} \mid w_{0})^{\beta} }{ p(X_{i} \mid w)^{\beta} } \right) \nonumber \\ & = & \frac{ \phi(w) }{ Z_{n}^{(0)}(\beta; X_{1:n}) } \exp \left( - \sum_{i=1}^{n} f(x_{i}, w) \right) \nonumber \\ & = & \frac{ \phi(w) }{ Z_{n}^{(0)}(\beta; X_{1:n}) } \exp \left( - n K_{n}(w; X_{1:n}) \right) \end{eqnarray}\]

Definition 8

The normalized free energy is defined as

\[F_{n}^{(0)}(\beta) := -\frac{1}{\beta} \log \int_{W} \exp \left( - n \beta K_{n}(w, X_{1:n}) \right) \phi(w) \ dw .\]

The average of generalization loss is defined as

\[G^{(0)} := - \int_{\mathcal{R}(X)} \log \int_{W} \exp \left( -f(x, w) \right) \phi(w) \ dw \ P_{X}(dx) .\]

The average of training loss is defined as

\[T_{n}^{(0)}(X^{1:n}) := - \frac{1}{n} \sum_{i=1}^{n} \log \int_{W} \exp \left( - f(X_{i}, w) \right) \phi(w) \ dw .\]

Note that

These definitions are equivalent to

\[\begin{eqnarray} G^{(0)} & = & \int_{\mathcal{R}(X)} \log \frac{ q(x) }{ \int_{W} p(x \mid w) \phi(w) \ dw } \ P_{X}(dx) \nonumber \\ T_{n}^{(0)} & = & \frac{1}{n} \sum_{i=1}^{n} \log \frac{ q(X_{i}) }{ \int_{W} p(X_{i} \mid w) \phi(w) \ dw } . \nonumber \end{eqnarray}\]

Lemma 5

(1)

\[\begin{eqnarray} F_{n}(\beta) & = & n L_{n}(w_{0}) + F_{n}^{(0)}(\beta) \nonumber \end{eqnarray}\]

(2)

\[\begin{eqnarray} G & = & L(w_{0}) + G^{(0)} \nonumber \\ \end{eqnarray}\]

(3)

\[\begin{eqnarray} T_{n} & = & L_{n}(w_{0}) + T_{n}^{(0)} . \nonumber \end{eqnarray}\]

proof

(1)

\[\begin{eqnarray} F_{n}(\beta) & = & - \frac{1}{\beta} \log Z_{n}(\beta; X^{1:n}) \nonumber \\ & = & - \frac{1}{\beta} \log \int_{W} \phi(w) \prod_{i=1}^{n} p(X_{i} \mid w)^{\beta} \ dw \nonumber \\ & = & - \frac{1}{\beta} \log \int_{W} \phi(w) \prod_{i=1}^{n} p(X_{i} \mid w)^{\beta} \frac{ p(X_{i} \mid w_{0})^{\beta} }{ p(X_{i} \mid w_{0})^{\beta} } \ dw \nonumber \\ & = & - \frac{1}{\beta} \log \int_{W} \phi(w) \prod_{i=1}^{n} p(X_{i} \mid w_{0})^{\beta} \prod_{i=1}^{n} \exp \left( \log \frac{ p(X_{i} \mid w)^{\beta} }{ p(X_{i} \mid w_{0})^{\beta} } \right) \ dw \nonumber \\ & = & - \frac{1}{\beta} \log \int_{W} \phi(w) \prod_{i=1}^{n} p(X_{i} \mid w_{0})^{\beta} \exp \left( - \sum_{i=1}^{n} \beta \log \frac{ p(X_{i} \mid w_{0}) }{ p(X_{i} \mid w) } \right) \ dw \nonumber \\ & = & - \frac{1}{\beta} \log \int_{W} \phi(w) \prod_{i=1}^{n} p(X_{i} \mid w_{0})^{\beta} \exp \left( - n \beta K_{n}(w; X_{1:n}) \right) \ dw \nonumber \\ & = & - \frac{1}{\beta} \sum_{i=1}^{n} \log p(X_{i} \mid w_{0})^{\beta} - \frac{1}{\beta} \log \int_{W} \phi(w) \exp \left( - n \beta K_{n}(w; X_{1:n}) \right) \ dw \nonumber \\ & = & - \frac{n}{n} \sum_{i=1}^{n} \log p(X_{i} \mid w_{0}) - \frac{1}{\beta} \log \int_{W} \phi(w) \exp \left( - n \beta K_{n}(w; X_{1:n}) \right) \ dw \nonumber \\ & = & n L_{n}(w_{0}) - \frac{1}{\beta} \log \int_{W} \phi(w) \exp \left( - n \beta K_{n}(w; X_{1:n}) \right) \ dw \nonumber \\ & = & n L_{n}(w_{0}) + F_{n}^{(0)}(\beta) \end{eqnarray}\]

(2)

\[\begin{eqnarray} G & = & - \int_{\mathcal{R}(X)} \log \left( \int_{W} p(x \mid w) \phi(w) \ dw \right) \ P_{X}(dx) \nonumber \\ & = & - \int_{\mathcal{R}(X)} \log \left( \int_{W} p(x \mid w) \frac{ p(x \mid w_{0}) }{ p(x \mid w_{0}) } \phi(w) \ dw \right) \ P_{X}(dx) \nonumber \\ & = & - \int_{\mathcal{R}(X)} \log \left( p(x \mid w_{0}) \int_{W} \exp \left( \log \frac{ p(x \mid w) }{ p(x \mid w_{0}) } \right) \phi(w) \ dw \right) \ P_{X}(dx) \nonumber \\ & = & \int_{\mathcal{R}(X)} - \log p(x \mid w_{0}) - \log \left( \int_{W} \exp \left( - f(x, w) \right) \phi(w) \ dw \right) \ P_{X}(dx) \nonumber \\ & = & \int_{\mathcal{R}(X)} - \log p(x \mid w_{0}) \ P_{X}(dx) + G^{(0)}(\beta) \nonumber \\ & = & L(w_{0}) + G^{(0)}(\beta) \nonumber \end{eqnarray}\]

(3)

\[\begin{eqnarray} T_{n} & = & - \frac{1}{n} \sum_{i=1}^{n} \log \int_{W} p(X_{i} \mid w) \phi(w) \ dw \nonumber \\ & = & - \frac{1}{n} \sum_{i=1}^{n} \log \int_{W} \frac{ p(X_{i} \mid w_{0}) }{ p(X_{i} \mid w_{0}) } p(X_{i} \mid w) \phi(w) \ dw \nonumber \\ & = & - \frac{1}{n} \sum_{i=1}^{n} \left( \log p(X_{i} \mid w_{0}) + \log \int_{W} \frac{ p(X_{i} \mid w) }{ p(X_{i} \mid w_{0}) } \phi(w) \ dw \right) \nonumber \\ & = & - \frac{1}{n} \sum_{i=1}^{n} \left( \log p(X_{i} \mid w_{0}) + \log \int_{W} \exp \left( \log \frac{ p(X_{i} \mid w) }{ p(X_{i} \mid w_{0}) } \right) \phi(w) \ dw \right) \nonumber \\ & = & - \frac{1}{n} \sum_{i=1}^{n} \left( \log p(X_{i} \mid w_{0}) + \log \int_{W} \exp \left( - \log f(X_{i}, w) \right) \phi(w) \ dw \right) \nonumber \\ & = & L_{n}(w_{0}) + T_{n}^{(0)} \nonumber \end{eqnarray}\]
$\Box$

Remark 16

In a later chapter, we will show if the log density ratio function $f(w, x)$ is relatively finite variance, there exist $\lambda, n > 0$ such that

\[\hat{Z}_{n}^{(0)}(\beta; X_{1:n}) := \frac{ n^{\lambda} }{ (\log n)^{m-1} } Z_{n}^{(0)}(\beta; X_{1:n})\]

converges in distribution as $n \rightarrow \infty$. If posterior distribution can be approximated by a normal distribution, $\lambda = d/2$ and $m = 1$. On the other hand, since the normalized marginal likihood is written as

\[\begin{eqnarray} Z_{n}^{(0)}(1; X_{1:n}) & := & \int_{W} \prod_{i=1}^{n} \frac{ p(X_{i} \mid w) }{ q(X_{i}) } \phi(w) \ dw , \end{eqnarray}\]

for all $n \in \mathbb{N}$ we have

\[\begin{eqnarray} \int_{\mathcal{R}(X_{1})} \cdots \int_{\mathcal{R}(X_{n})} Z_{n}^{(0)}(1; X_{1:n}) \ P_{X_{1}}(d x_{1}) \cdots \ P_{X_{n}}(d x_{n}) & = & \int_{\mathcal{R}(X_{1})} \cdots \int_{\mathcal{R}(X_{n})} Z_{n}^{(0)}(1; X^{1:n}) \ P_{X_{1}}(d x_{1}) \cdots \ P_{X_{n}}(d x_{n}) \nonumber \\ & = & \int_{\mathcal{R}(X_{1})} \cdots \int_{\mathcal{R}(X_{n})} \int_{W} \prod_{i=1}^{n} \frac{ p(x_{i} \mid w) }{ q(x_{i}) } \phi(w) \ dw q(x_{1}) \cdots q(x_{n}) \ P_{X_{1}}(d x_{1}) \cdots \ P_{X_{n}}(d x_{n}) \nonumber \\ & = & \int_{\mathcal{R}(X_{1})} \cdots \int_{\mathcal{R}(X_{n})} \int_{W} \prod_{i=1}^{n} p(x_{i} \mid w) \phi(w) \ dw \ P_{X_{1}}(d x_{1}) \cdots \ P_{X_{n}}(d x_{n}) \nonumber \\ & = & \int_{\mathcal{R}(X_{1})} \cdots \int_{\mathcal{R}(X_{n})} \int_{W} \prod_{i=1}^{n} p_{X_{i}, \Theta}(x_{i}, w) \ dw \ P_{X_{1}}(d x_{1}) \cdots \ P_{X_{n}}(d x_{n}) \nonumber \\ & = & 1 \nonumber \end{eqnarray}\]

where $\Theta: \Omega \rightarrow W$ is r.v. which takes value in parameter space $W$. Hence

\[\begin{eqnarray} \int_{\mathcal{R}(X_{1})} \cdots \int_{\mathcal{R}(X_{n})} \hat{Z}_{n}^{(0)}((x_{i}); 1) \ P_{X_{1}}(d x_{1}) \cdots \ P_{X_{n}}(d x_{n}) & = & \frac{ n^{\lambda} }{ (\log n)^{m-1} } \rightarrow \infty \quad (n \rightarrow \infty) . \nonumber \end{eqnarray}\]

Therefore, \(\hat{Z}_{n}^{(0)}((x_{i}); 1)\) converges in distribution, however, its expecation does not converge.

2.2.3 Cumulant Generating Functions

Definition 9

The cumulant generating functions of generalization loss is

\[\begin{eqnarray} \mathcal{G}(\alpha) & := & \int_{\mathcal{R}(X)} \log \int_{W} p(x \mid w)^{\alpha} \phi(w) \ dw \ P_{X}(dx) . \nonumber \end{eqnarray}\]

The cumulant generating functions of training loss is

\[\begin{eqnarray} \mathcal{T}_{n}(\alpha) & := & \frac{1}{n} \sum_{i=1}^{n} \log \int_{W} p(X_{i} \mid w)^{\alpha} \phi(w) \ dw \nonumber \end{eqnarray}\]

The $k$-th cumulant of generalization losses is defined by

\[\begin{eqnarray} \frac{\partial }{\partial \alpha^{k}} \mathcal{G}(0), \nonumber \end{eqnarray}\]

The $k$-th cumulant of traning losses is defined by

\[\begin{eqnarray} \frac{\partial }{\partial \alpha^{k}} \mathcal{T}_{n}(0; (X_{i})) . \nonumber \end{eqnarray}\]

Remark 17

By definition,

\[\begin{eqnarray} G & = & - \mathcal{G}(1), \nonumber \\ T_{n} & = & - \mathcal{T}_{n}(1) . \nonumber \end{eqnarray}\]

Definitoin 10

\[\begin{eqnarray} \ell_{k}(A; \alpha) & := & \frac{ \int_{W} (\log p(A \mid w))^{k} p(A \mid w)^{\alpha} \phi(w) \ dw }{ \int_{W} p(A \mid w)^{\alpha} \phi(w) \ dw } \nonumber \end{eqnarray} .\]

Lemma 6

(1)

\[\begin{eqnarray} \mathcal{G}^{(1)}(\alpha) & = & \int_{\mathcal{R}(X)} \ell_{1}(x; \alpha) \ P_{X}(d x) \nonumber \\ \mathcal{G}^{(2)}(\alpha) & = & \int_{\mathcal{R}(X)} \ell_{2}(x; \alpha) - \ell_{1}(x; \alpha)^{2} \ P_{X}(d x) \nonumber \end{eqnarray}\]

(2)

\[\begin{eqnarray} \mathcal{T}_{n}^{(1)}(\alpha) & = & \frac{1}{n} \sum_{i=1}^{n} \ell_{1}(X_{i}; \alpha) \nonumber \\ \mathcal{T}_{n}^{(2)}(\alpha) & = & \frac{1}{n} \sum_{i=1}^{n} \left( \ell_{1}(X_{i}; \alpha) - \ell_{2}(X_{i}; \alpha)^{2} \right) . \nonumber \end{eqnarray}\]

proof

\[\begin{eqnarray} \mathcal{G}^{1}(\alpha) & = & \frac{d}{d \alpha} \int_{\mathcal{R}(X)} \log \int_{W} p(x \mid w)^{\alpha} \phi(w) \ dw \ P_{X}(dx) \nonumber \\ & = & \int_{\mathcal{R}(X)} \frac{ \frac{d}{d \alpha} \int_{W} p(x \mid w)^{\alpha} \phi(w) \ dw }{ \int_{W} p(x \mid w)^{\alpha} \phi(w) \ dw } \ P_{X}(dx) \nonumber \\ & = & \int_{\mathcal{R}(X)} \frac{ \int_{W} (\log p(x \mid w)) p(x \mid w)^{\alpha} \phi(w) \ dw }{ \int_{W} p(x \mid w)^{\alpha} \phi(w) \ dw } \ P_{X}(dx) \nonumber \\ & = & \int_{\mathcal{R}(X)} \ell(X_{i}; \alpha) \ P_{X}(dx) \nonumber \\ \end{eqnarray}\] \[\begin{eqnarray} \frac{d }{d \alpha} \ell_{1}(X_{i}; \alpha) & = & \frac{d }{d \alpha} \frac{ \int_{W} (\log p(x \mid w)) p(x \mid w)^{\alpha} \phi(w) \ dw }{ \int_{W} p(x \mid w)^{\alpha} \phi(w) \ dw } \nonumber \\ & = & - \frac{ \left( \int_{W} (\log p(x \mid w)) p(x \mid w)^{\alpha} \phi(w) \ dw \right)^{2} }{ \left( \int_{W} p(x \mid w)^{\alpha} \phi(w) \ dw \right)^{2} } + \frac{ \int_{W} (\log p(x \mid w))^{2} p(x \mid w)^{\alpha} \phi(w) \ dw }{ \int_{W} p(x \mid w)^{\alpha} \phi(w) \ dw } \nonumber \\ & = & - \ell_{1}(X_{i}; \alpha)^{2} + \ell_{2}(X_{i}; \alpha) \nonumber \\ \frac{d }{d \alpha} \int_{\mathcal{R}(X)} \ell(X_{i}; \alpha) \ P_{X}(dx) & = & \int_{\mathcal{R}(X)} \frac{d }{d \alpha} \ell(X_{i}; \alpha) \ P_{X}(dx) \nonumber \\ & = & \int_{\mathcal{R}(X)} \ell_{2}(X_{i}; \alpha) - \ell_{1}(X_{i}; \alpha)^{2} \ P_{X}(dx) \nonumber \end{eqnarray}\] \[\begin{eqnarray} \frac{d }{d \alpha} \mathcal{T}_{n}(\alpha) & = & \frac{d }{d \alpha} \frac{1}{n} \sum_{i=1}^{n} \log \int_{W} p(X_{i} \mid w)^{\alpha} \phi(w) \ dw \nonumber \\ & = & \frac{1}{n} \sum_{i=1}^{n} \frac{ \frac{d }{d \alpha} \int_{W} p(X_{i} \mid w)^{\alpha} \phi(w) \ dw }{ \int_{W} p(X_{i} \mid w)^{\alpha} \phi(w) \ dw } \nonumber \\ & = & \frac{1}{n} \sum_{i=1}^{n} \frac{ \int_{W} (\log p(X_{i} \mid w)) p(X_{i} \mid w)^{\alpha} \phi(w) \ dw }{ \int_{W} p(X_{i} \mid w)^{\alpha} \phi(w) \ dw } \nonumber \\ & = & \frac{1}{n} \sum_{i=1}^{n} \ell_{1}(X_{i}; \alpha) \nonumber \end{eqnarray}\] \[\begin{eqnarray} \frac{d}{d \alpha} \mathcal{T}_{n}^{2}(\alpha) & = & \frac{d}{d \alpha} \frac{1}{n} \sum_{i=1}^{n} \ell_{1}(X_{i}; \alpha) \nonumber \\ & = & \frac{1}{n} \sum_{i=1}^{n} \left( \ell_{2}(X_{i}; \alpha) - \ell_{1}(X_{i}; \alpha)^{2} \right) . \nonumber \end{eqnarray}\] \[\begin{eqnarray} \frac{d}{d \alpha} \ell_{2}(X_{i}; \alpha) & = & \frac{d}{d \alpha} \frac{ \int_{W} (\log p(x \mid w))^{2} p(x \mid w)^{\alpha} \phi(w) \ dw }{ \int_{W} p(x \mid w)^{\alpha} \phi(w) \ dw } \nonumber \\ & = & - \frac{ \int_{W} (\log p(x \mid w))^{2} p(x \mid w)^{\alpha} \phi(w) \ dw \int_{W} (\log p(x \mid w)) p(x \mid w)^{\alpha} \phi(w) \ dw }{ \left( \int_{W} p(x \mid w)^{\alpha} \phi(w) \ dw \right)^{2} } + \frac{ \int_{W} (\log p(x \mid w))^{3} p(x \mid w)^{\alpha} \phi(w) \ dw }{ \int_{W} p(x \mid w)^{\alpha} \phi(w) \ dw } \nonumber \\ & = & - \ell_{1}(X; \alpha) \ell_{2}(X; \alpha) + \ell_{3}(X; \alpha) \nonumber \end{eqnarray}\] \[\begin{eqnarray} \frac{d}{d \alpha} \ell_{k}(X; \alpha) & = & \frac{d}{d \alpha} \frac{ \int_{W} (\log p(x \mid w))^{k} p(x \mid w)^{\alpha} \phi(w) \ dw }{ \int_{W} p(x \mid w)^{\alpha} \phi(w) \ dw } \nonumber \\ & = & - \frac{ \int_{W} (\log p(x \mid w))^{k} p(x \mid w)^{\alpha} \phi(w) \ dw \int_{W} (\log p(x \mid w)) p(x \mid w)^{\alpha} \phi(w) \ dw }{ \left( \int_{W} p(x \mid w)^{\alpha} \phi(w) \ dw \right)^{2} } + \frac{ \int_{W} (\log p(x \mid w))^{k + 1} p(x \mid w)^{\alpha} \phi(w) \ dw }{ \left( \int_{W} p(x \mid w)^{\alpha} \phi(w) \ dw \right) } \nonumber \\ & = & - \ell_{k}(X; \alpha) \ell_{1}(X; \alpha) + \ell_{k+1}(X; \alpha) \nonumber \end{eqnarray}\]

Definition 11

\[\begin{eqnarray} \mathcal{L}_{k}(A) := \frac{ \int_{W} (f(A, w))^{k} \exp \left( -\alpha f(A, w) \right) \phi(w) \ dw }{ \int_{W} \exp \left( -\alpha f(A, w) \right) \phi(w) \ dw } \nonumber \end{eqnarray}\]

Lemma 7

\[\begin{eqnarray} \mathcal{G}^{(1)}(\alpha) & = & L(w_{0}) + \int_{\mathcal{R}(X)} \mathcal{L}_{1}(X) \ P_{X}(dx) \nonumber \\ \mathcal{G}^{(2)}(\alpha) & = & \int_{\mathcal{R}(X)} \mathcal{L}_{2}(X) - \mathcal{L}_{1}(X)^{2} \ P_{X}(dx) \nonumber \end{eqnarray}\] \[\begin{eqnarray} \mathcal{T}_{n}^{(1)}(\alpha) & = & L(w_{0}) + \frac{1}{n} \sum_{i=1}^{n} \mathcal{L}_{1}(X_{i}) \nonumber \\ \mathcal{T}_{n}^{(2)}(\alpha) & = & \frac{1}{n} \sum_{i=1}^{n} \left( \mathcal{L}_{2}(X_{i}) - \mathcal{L}_{1}(X_{i})^{2} \right) \nonumber \end{eqnarray}\]

proof

General loss can be written

\[\begin{eqnarray} \mathcal{G}(\alpha) & = & \int_{\mathcal{R}(X)} \log \int_{W} p(x \mid w)^{\alpha} \phi(dw) \ dw \ P_{X}(dx) \nonumber \\ & = & \int_{\mathcal{R}(X)} \log \int_{W} \exp \left( \log \frac{ p(x \mid w)^{\alpha} }{ p(x \mid w_{0})^{\alpha} } + \log p(x \mid w_{0})^{\alpha} \right) \phi(dw) \ dw \ P_{X}(dx) \nonumber \\ & = & \int_{\mathcal{R}(X)} \log \int_{W} \exp \left( - \alpha f(x, w) + \log p(x \mid w_{0})^{\alpha} \right) \phi(dw) \ dw \ P_{X}(dx) \nonumber \\ & = & \int_{\mathcal{R}(X)} \log \int_{W} p(x \mid w_{0})^{\alpha} \exp \left( - \alpha f(x, w) \right) \phi(dw) \ dw \ P_{X}(dx) \nonumber \\ & = & \int_{\mathcal{R}(X)} \log p(x \mid w_{0})^{\alpha} + \log \int_{W} \exp \left( - \alpha f(x, w) \right) \phi(dw) \ dw \ P_{X}(dx) \nonumber \\ & = & \alpha L(w_{0}) + \int_{\mathcal{R}(X)} \log \int_{W} \exp \left( - \alpha f(x, w) \right) \phi(dw) \ dw \ P_{X}(dx) \nonumber \end{eqnarray}\]

To compute the derivative of the genral loss, we only need to compute the derivative of the second term in the above equation.

\[\begin{eqnarray} & & \frac{d}{d \alpha} \int_{\mathcal{R}(X)} \log \int_{W} \exp \left( - \alpha f(x, w) \right) \phi(dw) \ dw \ P_{X}(dx) \nonumber \\ & = & \frac{d}{d \alpha} \int_{\mathcal{R}(X)} \log \int_{W} \exp \left( - \alpha f(x, w) \right) \phi(dw) \ dw \ P_{X}(dx) \nonumber \\ & = & \int_{\mathcal{R}(X)} \frac{d}{d \alpha} \log \int_{W} \exp \left( - \alpha f(x, w) \right) \phi(dw) \ dw \ P_{X}(dx) \nonumber \\ & = & \int_{\mathcal{R}(X)} \frac{ \frac{d}{d \alpha} \int_{W} \exp \left( - \alpha f(x, w) \right) \phi(dw) \ dw }{ \int_{W} \exp \left( - \alpha f(x, w) \right) \phi(dw) \ dw } \ P_{X}(dx) \nonumber \\ & = & \int_{\mathcal{R}(X)} \frac{ \int_{W} \frac{d}{d \alpha} \exp \left( - \alpha f(x, w) \right) \phi(dw) \ dw }{ \int_{W} \exp \left( - \alpha f(x, w) \right) \phi(dw) \ dw } \ P_{X}(dx) \nonumber \\ & = & \int_{\mathcal{R}(X)} \frac{ \int_{W} (-f(x, w)) \exp \left( - \alpha f(x, w) \right) \phi(dw) \ dw }{ \int_{W} \exp \left( - \alpha f(x, w) \right) \phi(dw) \ dw } \ P_{X}(dx) \nonumber \\ & = & - \int_{\mathcal{R}(X)} \mathcal{L}_{1}(x;\alpha) \ P_{X}(dx) \nonumber \end{eqnarray}\]

The second derivative of the term is

\[\begin{eqnarray} & & \frac{d^{2}}{d \alpha^{2}} \int_{\mathcal{R}(X)} \log \int_{W} \exp \left( - \alpha f(x, w) \right) \phi(w) \ dw \ P_{X}(dx) \nonumber \\ & = & - \frac{d}{d \alpha} \mathcal{L}_{1}(X;\alpha) \nonumber \\ & = & - \int_{\mathcal{R}(X)} - \frac{ \int_{W} f(x, w) \exp \left( - \alpha f(x, w) \right) \phi(w) \ dw \int_{W} f(x, w) \exp \left( - \alpha f(x, w) \right) \phi(w) \ dw }{ \left( \int_{W} \exp \left( - \alpha f(x, w) \right) \phi(w) \ dw \right)^{2} } + \frac{ \int_{W} - (f(x, w))^{2} \exp \left( - \alpha f(x, w) \right) \phi(w) \ dw }{ \int_{W} \exp \left( - \alpha f(x, w) \right) \phi(w) \ dw } \ P_{X}(dx) \nonumber \\ & = & - \int_{\mathcal{R}(X)} - \frac{ \left( \int_{W} f(x, w) \exp \left( - \alpha f(x, w) \right) \phi(w) \ dw \right)^{2} }{ \left( \int_{W} \exp \left( - \alpha f(x, w) \right) \phi(w) \ dw \right)^{2} } - \frac{ \int_{W} (f(x, w))^{2} \exp \left( - \alpha f(x, w) \right) \phi(w) \ dw }{ \int_{W} \exp \left( - \alpha f(x, w) \right) \phi(w) \ dw } \ P_{X}(dx) \nonumber \\ & = & \int_{\mathcal{R}(X)} \mathcal{L}_{1}(x; \alpha)^{2} + \mathcal{L}_{2}(x; \alpha) \ P_{X}(dx) . \nonumber \end{eqnarray}\]

In general, the $k$-th derivative can be computed by using the derivative of $\mathcal{L}_{k}(X; \alpha)$.

\[\begin{eqnarray} & & \frac{d}{d \alpha} \mathcal{L}_{k}(X) \nonumber \\ & = & \frac{d}{d \alpha} \int_{\mathcal{R}(X)} \frac{ \int_{W} (f(x, w))^{k} \exp \left( - \alpha f(x, w) \right) \phi(w) \ dw }{ \int_{W} \exp \left( - \alpha f(x, w) \right) \phi(w) \ dw } \ P_{X}(dx) \nonumber \\ & = & \int_{\mathcal{R}(X)} - \frac{ \int_{W} (f(x, w))^{k} \exp \left( - \alpha f(x, w) \right) \phi(w) \ dw \int_{W} f(x, w) \exp \left( - \alpha f(x, w) \right) \phi(w) \ dw }{ \left( \int_{W} \exp \left( - \alpha f(x, w) \right) \phi(w) \ dw \right)^{2} } + \frac{ \int_{W} - (f(x, w))^{k+1} \exp \left( - \alpha f(x, w) \right) \phi(w) \ dw }{ \int_{W} \exp \left( - \alpha f(x, w) \right) \phi(w) \ dw } \ P_{X}(dx) \nonumber \\ & = & \int_{\mathcal{R}(X)} - \mathcal{L}_{k}(x; \alpha) \mathcal{L}_{1}(x; \alpha) - \mathcal{L}_{k+1}(x; \alpha) \ P_{X}(dx) . \nonumber \end{eqnarray}\]

Hence the derivatives of general loss are

\[\begin{eqnarray} \mathcal{G}^{(1)} & = & - \int_{\mathcal{R}(X)} L(w_{0}) + \mathcal{L}_{1}(x; \alpha) \ P_{X}(dx) \nonumber \\ \mathcal{G}^{(2)} & = & - \int_{\mathcal{R}(X)} \frac{d}{d \alpha} \mathcal{L}_{1}(x; \alpha) \ P_{X}(dx) \nonumber \\ & = & \int_{\mathcal{R}(X)} \mathcal{L}_{1}(x; \alpha)^{2} + \mathcal{L}_{2}(x; \alpha) \ P_{X}(dx) \nonumber \\ \mathcal{G}^{(3)} & = & \int_{\mathcal{R}(X)} 2 \mathcal{L}_{1}(x; \alpha) \frac{d}{d \alpha} \mathcal{L}_{1}(x; \alpha) - \mathcal{L}_{2}(x; \alpha) \mathcal{L}_{1}(x; \alpha) - \mathcal{L}_{3}(x; \alpha) \ P_{X}(dx) \nonumber \\ & = & \int_{\mathcal{R}(X)} 2 \mathcal{L}_{1}(x; \alpha) \left( - \mathcal{L}_{1}(x; \alpha)^{2} - \mathcal{L}_{2}(x; \alpha) \right) - \mathcal{L}_{2}(x; \alpha) \mathcal{L}_{1}(x; \alpha) - \mathcal{L}_{3}(x; \alpha) \ P_{X}(dx) \nonumber \\ & = & \int_{\mathcal{R}(X)} - 2 \mathcal{L}_{1}(x; \alpha)^{3} - 2 \mathcal{L}_{1}(x; \alpha) \mathcal{L}_{2}(x; \alpha) - \mathcal{L}_{2}(x; \alpha) \mathcal{L}_{1}(x; \alpha) - \mathcal{L}_{3}(x; \alpha) \ P_{X}(dx) \nonumber \\ & = & \int_{\mathcal{R}(X)} - 2 \mathcal{L}_{1}(x; \alpha)^{3} - 3 \mathcal{L}_{1}(x; \alpha) \mathcal{L}_{2}(x; \alpha) - \mathcal{L}_{3}(x; \alpha) \ P_{X}(dx) . \nonumber \end{eqnarray}\]

Now we will compute the derivatives of the training losses.

\[\begin{eqnarray} \mathcal{T}_{n}(\alpha) & = & \frac{1}{n} \sum_{i=1}^{n} \log \int_{W} p(X_{i} \mid w)^{\alpha} \phi(w) \ dw \nonumber \\ & = & \frac{1}{n} \sum_{i=1}^{n} \log \int_{W} \exp \left( \log \frac{ p(X_{i} \mid w)^{\alpha} }{ p(X_{i} \mid w_{0})^{\alpha} } + \log p(X_{i} \mid w_{0})^{\alpha} \right) \phi(w) \ dw \nonumber \\ & = & \frac{1}{n} \sum_{i=1}^{n} \log \int_{W} p(X_{i} \mid w_{0})^{\alpha} \exp \left( - \alpha f(X_{i}, w) \right) \phi(w) \ dw \nonumber \\ & = & \alpha L_{n}(w_{0}; (X_{i})) + \frac{1}{n} \sum_{i=1}^{n} \log \int_{W} \exp \left( - \alpha f(X_{i}, w) \right) \phi(w) \ dw \nonumber \end{eqnarray}\]

The derivative of the second term is

\[\begin{eqnarray} \frac{d}{d \alpha} \sum_{i=1}^{n} \log \int_{W} \exp \left( - \alpha f(X_{i}, w) \right) \phi(w) \ dw & = & \sum_{i=1}^{n} \frac{ \int_{W} (- f(X_{i}, w)) \exp \left( - \alpha f(X_{i}, w) \right) \phi(w) \ dw }{ \int_{W} \exp \left( - \alpha f(X_{i}, w) \right) \phi(w) \ dw } \nonumber \\ & = & \sum_{i=1}^{n} - \mathcal{L}_{1}(X_{i}; \alpha) \nonumber \end{eqnarray}\]

Hence the derivatives of the training losses are

\[\begin{eqnarray} \mathcal{T}_{n}^{(1)} & = & L_{n}(w_{0}; (X_{i})) - \frac{1}{n} \sum_{i=1}^{n} \mathcal{L}_{1}(X_{i}; \alpha) \nonumber \\ \mathcal{T}_{n}^{(2)} & = & \frac{d}{d \alpha} \left( \frac{1}{n} \sum_{i=1}^{n} \log p(X_{i} \mid w_{0}) - \frac{1}{n} \sum_{i=1}^{n} \mathcal{L}_{1}(X_{i}; \alpha) \right) \nonumber \\ & = & - \frac{1}{n} \sum_{i=1}^{n} \left( - \mathcal{L}_{1}(X_{i}; \alpha)^{2} - \mathcal{L}_{2}(X_{i}; \alpha) \right) \nonumber \\ & = & \frac{1}{n} \sum_{i=1}^{n} \left( \mathcal{L}_{1}(X_{i}; \alpha)^{2} + \mathcal{L}_{2}(X_{i}; \alpha) \right) \nonumber \\ \mathcal{T}^{(3)} & = & \frac{d}{d \alpha} \frac{1}{n} \sum_{i=1}^{n} \left( \mathcal{L}_{1}(X_{i}; \alpha)^{2} + \mathcal{L}_{2}(X_{i}; \alpha) \right) \nonumber \\ & = & \frac{1}{n} \sum_{i=1}^{n} \left( 2 \mathcal{L}_{1}(X_{i}; \alpha) \frac{d}{d \alpha} \mathcal{L}_{1}(X_{i}; \alpha) - \mathcal{L}_{2}(X_{i}; \alpha) \mathcal{L}_{1}(X_{i}; \alpha) - \mathcal{L}_{3}(X_{i}; \alpha) \right) \nonumber \\ & = & \frac{1}{n} \sum_{i=1}^{n} \left( 2 \mathcal{L}_{1}(X_{i}; \alpha) \left( - \mathcal{L}_{1}(X_{i}; \alpha)^{2} - \mathcal{L}_{2}(X_{i}; \alpha) \right) - \mathcal{L}_{2}(X_{i}; \alpha) \mathcal{L}_{1}(X_{i}; \alpha) - \mathcal{L}_{3}(X_{i}; \alpha) \right) \nonumber \\ & = & \frac{1}{n} \sum_{i=1}^{n} \left( - 2 \mathcal{L}_{1}(X_{i}; \alpha)^{3} - 2 \mathcal{L}_{1}(X_{i}; \alpha) \mathcal{L}_{2}(X_{i}; \alpha) - \mathcal{L}_{2}(X_{i}; \alpha) \mathcal{L}_{1}(X_{i}; \alpha) - \mathcal{L}_{3}(X_{i}; \alpha) \right) \nonumber \\ & = & \frac{1}{n} \sum_{i=1}^{n} \left( - 2 \mathcal{L}_{1}(X_{i}; \alpha)^{3} - 3 \mathcal{L}_{1}(X_{i}; \alpha) \mathcal{L}_{2}(X_{i}; \alpha) - \mathcal{L}_{3}(X_{i}; \alpha) \right) . \nonumber \end{eqnarray}\]
$\Box$

Remark

To change the order of integral and derivative, there are

such that

\[\begin{eqnarray} \abs{ \log \int_{W} \exp \left( - \alpha f(x, w) \right) \phi(w) \ dw } & \le & h^{1}(x) \nonumber \\ \exp \left( - \alpha f(x, w) \right) \phi(w) & \le & h^{2}(x, w) \nonumber \end{eqnarray}\]

Remark

Relation between $\ell_{k}(X; \alpha)$ and \(\mathcal{L}_{k}(X; \alpha)\) is

\[\begin{eqnarray} & & \ell_{k}(X; \alpha) \nonumber \\ & = & \frac{ \int_{W} (\log p(X \mid w)) p(X \mid w)^{\alpha} \phi(w) \ dw }{ \int_{W} p(X \mid w)^{\alpha} \phi(w) \ dw } \nonumber \\ & = & \frac{ \int_{W} (\log p(X \mid w)) \exp \left( \log p(X \mid w)^{\alpha} \right) \phi(w) \ dw }{ \int_{W} \exp \left( \log p(X \mid w)^{\alpha} \right) \phi(w) \ dw } \nonumber \\ & = & \frac{ \int_{W} \left( \log p(X \mid w_{0}) + \log \frac{ p(X \mid w) }{ p(X \mid w_{0}) } \right) \exp \left( \alpha \log \frac{ p(X \mid w) }{ p(X \mid w_{0}) } + \alpha \log p(X \mid w_{0}) \right) \phi(w) \ dw }{ \int_{W} \exp \left( \alpha \log \frac{ p(X \mid w) }{ p(X \mid w_{0}) } + \alpha \log p(X \mid w_{0}) \right) \phi(w) \ dw } \nonumber \\ & = & \frac{ \log p(X \mid w_{0}) \int_{W} \exp \left( - \alpha f(X, w) + \alpha \log p(X \mid w_{0}) \right) \phi(w) \ dw - \int_{W} f(X, w) \exp \left( - \alpha f(X, w) + \alpha \log p(X \mid w_{0}) \right) \phi(w) \ dw }{ \int_{W} \exp \left( - \alpha f(X, w) \right) \exp \left( \alpha p(X \mid w_{0}) \right) \phi(w) \ dw } \nonumber \\ & = & \frac{ \log p(X \mid w_{0}) \int_{W} \exp \left( - \alpha f(X, w) \right) \exp \left( \alpha \log p(X \mid w_{0}) \right) \phi(w) \ dw - \int_{W} f(X, w) \exp \left( - \alpha f(X, w) \right) \exp \left( \alpha \log p(X \mid w_{0}) \right) \phi(w) \ dw }{ \int_{W} \exp \left( - \alpha f(X, w) \right) \exp \left( \alpha \log p(X \mid w_{0}) \right) \phi(w) \ dw } \nonumber \end{eqnarray}\]

Lemma 8

Let $c_{2} := 2$, $c_{3} := 6$, $c_{4} = 26$.

\[\begin{eqnarray} \abs{ \frac{d^{k}}{d \alpha^{k}} \mathcal{G}(\alpha) } & \le & c_{k} \int_{\mathcal{R}(X)} \frac{ \int_{W} \abs{ f(x, w) }^{k} \phi(w) \exp \left( - \alpha f(x, w) \right) \ dw }{ \int_{W} \exp \left( - \alpha f(x, w) \right) \phi(w) \ dw } \ P_{X}(dx) \nonumber \\ \abs{ \frac{d^{k}}{d \alpha^{k}} \mathcal{T}_{n}(\alpha) } & \le & c_{k} \frac{1}{n} \sum_{i=1}^{n} \frac{ \int_{W} \abs{ f(X_{i}, w) }^{k} \phi(w) \exp \left( - \alpha f(X_{i}, w) \right) \ dw }{ \int_{W} \exp \left( - \alpha f(X_{i}, w) \right) \phi(w) \ dw } \end{eqnarray}\]

proof

\[\begin{eqnarray} \abs{ \mathcal{L}_{k}(A; \alpha) } & = & \abs{ \frac{ \int_{W} (f(A, w))^{k} \exp \left( -\alpha f(A, w) \right) \phi(w) \ dw }{ \int_{W} \exp \left( -\alpha f(A, w) \right) \phi(w) \ dw } } \nonumber \\ & \le & \frac{ \abs{ \int_{W} (f(A, w))^{k} \exp \left( -\alpha f(A, w) \right) \phi(w) \ dw } }{ \int_{W} \exp \left( -\alpha f(A, w) \right) \phi(w) \ dw } \nonumber \\ & \le & \frac{ \int_{W} \abs{ (f(A, w))^{k} \exp \left( -\alpha f(A, w) \right) \phi(w) } \ dw }{ \int_{W} \exp \left( -\alpha f(A, w) \right) \phi(w) \ dw } \nonumber \\ & = & \frac{ \int_{W} \abs{ (f(A, w))^{k} } \exp \left( -\alpha f(A, w) \right) \phi(w) \ dw }{ \int_{W} \exp \left( -\alpha f(A, w) \right) \phi(w) \ dw } =: \overline{\mathcal{L}}_{k}(A; \alpha) \nonumber \end{eqnarray}\] \[\begin{eqnarray} \abs{ \mathcal{L}_{1}(X; \alpha)^{2} } & = & \abs{ \frac{ \left( \int_{W} (f(A, w)) \phi(w) \exp \left( -\alpha f(A, w) \right) \ dw \right)^{2} }{ \left( \int_{W} \exp \left( -\alpha f(A, w) \right) \phi(w) \ dw \right)^{2} } } \nonumber \\ & = & \frac{ \left( \abs{ \int_{W} (f(A, w)) \phi(w) \exp \left( -\alpha f(A, w) \right) \ dw } \right)^{2} }{ \left( \int_{W} \exp \left( -\alpha f(A, w) \right) \phi(w) \ dw \right)^{2} } \nonumber \\ & \le & \frac{ \left( \int_{W} \abs{ (f(A, w)) \phi(w) \exp \left( -\alpha f(A, w) \right) } \ dw \right)^{2} }{ \left( \int_{W} \exp \left( -\alpha f(A, w) \right) \phi(w) \ dw \right)^{2} } \nonumber \\ & \le & \frac{ \left( \left( \int_{W} \abs{ f(A, w) \left( \exp \left( -\alpha f(A, w) \right) \phi(w) \right)^{1/2} }^{2} \ dw \right)^{1/2} \left( \int_{W} \abs{ \left( \phi(w) \exp \left( -\alpha f(A, w) \right) \right)^{1/2} }^{2} \ dw \right)^{1/2} \right)^{2} }{ \left( \int_{W} \exp \left( -\alpha f(A, w) \right) \phi(w) \ dw \right)^{2} } \nonumber \\ & = & \frac{ \int_{W} \abs{ (f(A, w)) }^{2} \exp \left( -\alpha f(A, w) \right) \phi(w) \ dw }{ \int_{W} \exp \left( -\alpha f(A, w) \right) \phi(w) \ dw } \nonumber \\ & = & \overline{\mathcal{L}}_{2}(X; \alpha) \nonumber \end{eqnarray}\]

In general,

\[\begin{eqnarray} \abs{ \mathcal{L}_{1}(X; \alpha)^{k} } & = & \abs{ \frac{ \left( \int_{W} (f(A, w)) \phi(w) \exp \left( -\alpha f(A, w) \right) \ dw \right)^{k} }{ \left( \int_{W} \exp \left( -\alpha f(A, w) \right) \phi(w) \ dw \right)^{k} } } \nonumber \\ & \le & \frac{ \left( \abs{ \int_{W} (f(A, w)) \phi(w) \exp \left( -\alpha f(A, w) \right) \ dw } \right)^{k} }{ \left( \int_{W} \exp \left( -\alpha f(A, w) \right) \phi(w) \ dw \right)^{k} } \nonumber \\ & \le & \frac{ \left( \int_{W} \abs{ (f(A, w)) \phi(w) \exp \left( -\alpha f(A, w) \right) } \ dw \right)^{k} }{ \left( \int_{W} \exp \left( -\alpha f(A, w) \right) \phi(w) \ dw \right)^{k} } \nonumber \\ & \le & \frac{ \left( \left( \int_{W} \abs{ (f(A, w)) \left( \phi(w) \exp \left( -\alpha f(A, w) \right) \right)^{1/k} }^{k} \ dw \right)^{1/k} \right)^{k} \left( \left( \int_{W} \abs{ \left( \phi(w) \exp \left( -\alpha f(A, w) \right) \right)^{(k-1)/k} }^{k/(k-1)} \ dw \right)^{(k-1)/k} \right)^{k} }{ \left( \int_{W} \exp \left( -\alpha f(A, w) \right) \phi(w) \ dw \right)^{k} } \nonumber \\ & = & \frac{ \int_{W} (f(A, w))^{k} \phi(w) \exp \left( -\alpha f(A, w) \right) \ dw \left( \int_{W} \phi(w) \exp \left( -\alpha f(A, w) \right) \ dw \right)^{(k-1)} }{ \left( \int_{W} \exp \left( -\alpha f(A, w) \right) \phi(w) \ dw \right)^{k} } \nonumber \\ & = & \frac{ \int_{W} (f(A, w))^{k} \phi(w) \exp \left( -\alpha f(A, w) \right) \ dw }{ \int_{W} \exp \left( -\alpha f(A, w) \right) \phi(w) \ dw } \nonumber \\ & = & \overline{\mathcal{L}}_{k}(A; \alpha) \nonumber \end{eqnarray}\]

In general, letting $k := k_{1} + k_{2}$,

\[\begin{eqnarray} & & \abs{ \mathcal{L}_{k_{1}}(X; \alpha) \mathcal{L}_{k_{2}}(X; \alpha) } \nonumber \\ & = & \abs{ \frac{ \int_{W} (f(A, w))^{k_{1}} \phi(w) \exp \left( -\alpha f(A, w) \right) \ dw }{ \int_{W} \exp \left( -\alpha f(A, w) \right) \phi(w) \ dw } } \abs{ \mathcal{L}_{k_{2}}(X; \alpha) } \nonumber \\ & \le & \frac{ \abs{ \int_{W} (f(A, w))^{k_{1}} \phi(w) \exp \left( -\alpha f(A, w) \right) \ dw } }{ \int_{W} \exp \left( -\alpha f(A, w) \right) \phi(w) \ dw } \abs{ \mathcal{L}_{k_{2}}(X; \alpha) } \nonumber \\ & \le & \frac{ \int_{W} \abs{ (f(A, w))^{k_{1}} \phi(w) \exp \left( -\alpha f(A, w) \right) } \ dw }{ \int_{W} \exp \left( -\alpha f(A, w) \right) \phi(w) \ dw } \abs{ \mathcal{L}_{k_{2}}(X; \alpha) } \nonumber \\ & \le & \frac{ \left( \int_{W} \abs{ (f(A, w))^{k_{1}} \left( \phi(w) \exp \left( -\alpha f(A, w) \right) \right)^{k_{1}/k} }^{k/k_{1}} \ dw \right)^{k_{1}/k} \left( \int_{W} \abs{ \left( \phi(w) \exp \left( -\alpha f(A, w) \right) \right)^{k_{2}/k} }^{k/k_{2}} \ dw \right)^{k_{2}/k} }{ \int_{W} \exp \left( -\alpha f(A, w) \right) \phi(w) \ dw } \abs{ \mathcal{L}_{k_{2}}(X; \alpha) } \nonumber \end{eqnarray}\] \[\begin{eqnarray} & = & \frac{ \left( \int_{W} (f(A, w))^{k} \phi(w) \exp \left( -\alpha f(A, w) \right) \ dw \right)^{k_{1}/k} \left( \int_{W} \phi(w) \exp \left( -\alpha f(A, w) \right) \ dw \right)^{k_{2}/k} }{ \int_{W} \exp \left( -\alpha f(A, w) \right) \phi(w) \ dw } \abs{ \mathcal{L}_{k_{2}}(X; \alpha) } \nonumber \\ & = & \left( \int_{W} (f(A, w))^{k} \phi(w) \exp \left( -\alpha f(A, w) \right) \ dw \right)^{k_{1}/k} \left( \int_{W} \phi(w) \exp \left( -\alpha f(A, w) \right) \ dw \right)^{k_{2}/k - 1} \abs{ \mathcal{L}_{k_{2}}(X; \alpha) } . \nonumber \end{eqnarray}\]

By continuing similar argument, we obtain

\[\begin{eqnarray} & & \abs{ \mathcal{L}_{k_{1}}(X; \alpha) \mathcal{L}_{k_{2}}(X; \alpha) } \nonumber \\ & \le & \left( \int_{W} (f(A, w))^{k} \phi(w) \exp \left( -\alpha f(A, w) \right) \ dw \right)^{k_{1}/k} \left( \int_{W} \phi(w) \exp \left( -\alpha f(A, w) \right) \ dw \right)^{k_{2}/k - 1} \left( \int_{W} (f(A, w))^{k} \phi(w) \exp \left( -\alpha f(A, w) \right) \ dw \right)^{k_{2}/k} \left( \int_{W} \phi(w) \exp \left( -\alpha f(A, w) \right) \ dw \right)^{k_{1}/k - 1} \nonumber \\ & = & \int_{W} \abs{ f(A, w) }^{k} \phi(w) \exp \left( -\alpha f(A, w) \right) \ dw \left( \int_{W} \phi(w) \exp \left( -\alpha f(A, w) \right) \ dw \right)^{-1} \nonumber \\ & = & \overline{\mathcal{L}}_{k}(X; \alpha) \nonumber \end{eqnarray}\]

Hence

\[\begin{eqnarray} \abs{ \mathcal{G}^{(2)}(\alpha) } & \le & \int_{\mathcal{R}(X)} \abs{ \mathcal{L}_{1}(x; \alpha)^{2} + \mathcal{L}_{2}(x; \alpha) } \ P_{X}(dx) \nonumber \\ & \le & \int_{\mathcal{R}(X)} \abs{ \mathcal{L}_{1}(X; \alpha)^{2} } + \abs{ \mathcal{L}_{2}(X; \alpha) } \ P_{X}(dx) \nonumber \\ & \le & \int_{\mathcal{R}(X)} 2 \overline{\mathcal{L}}_{2}(X; \alpha) \ P_{X}(dx) \end{eqnarray}\] \[\begin{eqnarray} \abs{ \mathcal{G}^{(3)}(\alpha) } & \le & \int_{\mathcal{R}(X)} \abs{ \mathcal{L}_{3}(x; \alpha)^{2} + 3 \mathcal{L}_{1}(x; \alpha) \mathcal{L}_{2}(x; \alpha) + 2\mathcal{L}_{1}(x; \alpha)^{3} } \ P_{X}(dx) \nonumber \\ & \le & \int_{\mathcal{R}(X)} \overline{\mathcal{L}}_{3}(x; \alpha)^{2} + 3 \overline{\mathcal{L}}_{3}(x; \alpha) + 2 \overline{\mathcal{L}}_{1}(x; \alpha)^{3} \ P_{X}(dx) \nonumber \\ & = & 6 \int_{\mathcal{R}(X)} \overline{\mathcal{L}}_{3}(x; \alpha)^{2} \ P_{X}(dx) . \nonumber \end{eqnarray}\]
$\Box$

Theorem 1

Assume

\[\begin{eqnarray} \sup_{\alpha \in [0, 1]} \abs{ \frac{d^{3}}{d \alpha^{3}} \mathcal{G}(\alpha) } & = & o_{p} \left( \frac{1}{n} \right) \nonumber \\ \sup_{\alpha \in [0, 1]} \abs{ \frac{d^{3}}{d \alpha^{3}} \mathcal{T}_{n}(\alpha) } & \le & o_{p} \left( \frac{1}{n} \right) . \nonumber \end{eqnarray}\]

Then

\[\begin{eqnarray} G & = & -\mathcal{G}(1) \nonumber \\ & = & - \mathcal{G}^{(1)}(0) - \frac{1}{2} \mathcal{G}^{(2)}(0) + o_{p}(\frac{1}{n}) \nonumber \\ T_{n} & = & - \mathcal{T}^{(1)}(0; (X_{i})) - \frac{1}{2} \mathcal{T}_{n}^{(2)}(0; X) + o_{p}(\frac{1}{n}) \nonumber \end{eqnarray}\]

proof

By using the mean value theorem, There exists $\alpha^{*} \in [-\alpha, \alpha]$ such that

\[\begin{eqnarray} & & \mathcal{G}(\alpha) = \mathcal{G}(0) + \alpha \mathcal{G}^{(1)}(0) + \frac{1}{2} \alpha^{2} \mathcal{G}^{(2)}(0) + \frac{1}{6} \alpha^{3} \mathcal{G}^{(3)}(\alpha^{*}) . \nonumber \end{eqnarray}\]

Substituting $\alpha = 1$, we have

\[\begin{eqnarray} & & \mathcal{G}(1) = \mathcal{G}(0) + \mathcal{G}^{(1)}(0) + \frac{1}{2} \mathcal{G}^{(2)}(0) + \frac{1}{6} \mathcal{G}^{(3)}(\alpha^{*}) \nonumber \\ & & = \mathcal{G}(0) + \mathcal{G}^{(1)}(0) + \frac{1}{2} \mathcal{G}^{(2)}(0) + \frac{1}{6} \mathcal{G}^{(3)}(\alpha^{*}) \end{eqnarray}\]
$\Box$