memo

Chapter3-03. Exponential family

3.3 Exponential family

Definition 3.13 Exponential families

$(\mathcal{X}, \mathcal{A})$,
- measurable space
$\Theta$,
- parameter space
$\mathcal{P} := \{P_{\theta}\}_{\theta \in \Theta}$,
- probability measure over sample sp. $(\mathcal{X}, \mathcal{A})$

$\mathcal{P}$ is said to be exponential families if there exist

$\mu: \mathcal{X} \rightarrow \mathrm{Range}(\mu) \subseteq \mathbb{R}$,
- $\sigma$-finite measure $\mu$ over $(\mathcal{Y}, \mathcal{B})$,
- $ \forall \theta \in \Theta$, $P_{\theta} \ll \mu$,
$k \in \mathbb{N}$,
$T_{i}: \mathcal{X} \rightarrow \mathcal{Y} \ (i = 1, \ldots, k)$,
- measurable function
$a_{i}: \Theta \rightarrow \mathcal{Y} \ (i = 1, \ldots, k)$,
- real valued function
$\psi: \Theta \rightarrow \mathbb{R}$,
- measurable function
$g: \mathcal{X} \rightarrow \mathcal{Y}$,
- measurable function

such that

\[\begin{eqnarray} \frac{ d P_{\theta} }{ d \mu }(x) & = & \exp \left( \sum_{j=1}^{k} a_{j}(\theta) T_{j}(x) - \psi(\theta) \right) g(x) \label{chap03_03_12_definition_of_exponential_family} \end{eqnarray}\]

■

Example 3.12

■

Example 3.13

■

Theorem 3.14

$(\mathcal{X}, \mathcal{A})$,
- measurable space
$\mathcal{P} := \{ P_{\theta}\}_{\theta \in \Theta}$,
- probability measure over $(\mathcal{X}, \mathcal{A})$,
$k \in \mathbb{N}$,
$(\mathcal{X}^{(i)}, \mathcal{A}^{(i)})$,
- measurable space for $i = 1, \ldots, k$
$\mathcal{P}^{(i)} := \{P_{\theta}^{(i)}\}_{\theta \in \Theta}$,
- probability measure over $(\mathcal{X}^{(i)}, \mathcal{A}^{(i)})$ for $i = 1, \ldots, k$

The following statements hold:

(a) If $\mathcal{P}$ is an exponential family, $\forall P_{\theta}, P_{\theta^{\prime}} \in \mathcal{P}$, $P_{\theta} \ll P_{\theta^{\prime}}$, $P_{\theta} \ll P_{\theta^{\prime}}$,
(b) If for all $i = 1, \ldots, n$, $\mathcal{P}^{(i)}$ is exponential family, then $\{\prod_{i=1}^{n} P_{\theta}^{(i)}\}$ is an exponential family over $( \prod_{i=1}^{n} \mathcal{X}^{(i)}, \prod_{i=1}^{n} \mathcal{A}^{(i)})$.
(c) $T$ is sufficinet with respect to $\mathcal{P}$.

proof

(a)

Let $A \in \mathcal{A}$ be fixed.

\[P_{\theta}(A) = \int_{A} \exp \left( \sum_{i=1}^{m} a_{i}(\theta) T_{i}(x) - \psi(\theta) \right) \ \mu(dx)\] \[\begin{eqnarray} \frac{ d P_{\theta} }{ d P_{\theta^{\prime}} } & = & \frac{ d P_{\theta} }{ d\mu } \frac{ 1 }{ \frac{ d P_{\theta^{\prime}} }{ d\mu } } \\ & = & \exp \left( \sum_{j=1}^{m} (a_{j}(\theta) - a_{j}(\theta^{\prime})) T_{j}(x) \right) \nonumber \end{eqnarray}\] \[P_{\theta}(A) = \int_{A} \exp \left( \sum_{j=1}^{m} (a_{j}(\theta) - a_{j}(\theta^{\prime})) T_{j}(x) \right) \ P_{\theta^{\prime}}(dx)\]

The integrand is positive so that $P_{\theta^{\prime}}(A) = 0$.

(b) Let $\mathcal{C}$ be a cylinder set. That is,

\[\mathcal{C} := \{ A_{1} \times \cdots \times A_{n} \mid A_{i} \in \mathcal{A}^{(i)} \} .\]

Then

\[\begin{eqnarray} \forall A_{1} \times \cdots A_{n} \in \mathcal{C}, \ \left( \prod_{i=1}^{n} P_{\theta}^{(i)} \right) (A_{1} \times \cdots A_{n}) & = & \prod_{i=1}^{n} P_{\theta}^{(i)}(A_{i}) \ (\because \text{ by definition of product measure}) \\ & = & \prod_{i=1}^{n} \left( \int_{A_{i}} \frac{ d P_{\theta}^{(i)} }{ d \mu }(x_{i}) \ \mu(d x_{i}) \right) \\ & = & \int_{A_{1}\times \cdots A_{n}} \frac{ d P_{\theta}^{(1)} }{ d \mu }(x_{1}) \times \cdots \times \frac{ d P_{\theta}^{(n)} }{ d \mu }(x_{n}) \ \mu(d x) . \nonumber \end{eqnarray}\]

Therefore,

\[\frac{ d \left( \prod_{i=1}^{n} P_{\theta}^{(i)} \right) }{ d \mu } = \prod_{i=1}^{n} \left( \frac{ d P_{\theta}^{(i)} }{ d \mu } \right) .\]

Obviously, the RHS of the equation is an exponential family.

(c)

Immediate consequence of factorization theorem (Theorem 3.7).

$\Box$

Theorem 3.15

$(\mathcal{X}, \mathcal{A})$,
- measurable space
$\mathcal{P} := \{ P_{\theta}\}_{\theta \in \Theta}$,
- probability measure over $(\mathcal{X}, \mathcal{A})$,
- defined in $\eqref{chap03_03_12_definition_of_exponential_family}$.
$T := (T_{1}, \ldots, T_{m})$,

Then $\{P_{\theta}^{T}\}_{\theta \in \Theta}$ is an exponential familiy. That is, there exist $\sigma$-finite measure $\lambda$ over $(\mathbb{R}^{m}, \mathcal{B}(\mathbb{R}^{m}))$ such that

\[\begin{eqnarray} P_{\theta} & \ll & \lambda \\ \frac{ d P_{\theta}^{T} }{ d \lambda } (t) & = & \exp \left( \sum_{i=1}^{m} a_{i}(\theta) t_{i} - \psi(\theta) \right) \ \lambda \text{-a.e.} \label{chap03_theorem_03_15_density} \end{eqnarray}\]

proof

Let $\theta \in \Theta$ be fixed and

\[B \in \mathcal{B}(\mathbb{R}^{m}), \ \lambda(B) := \int_{B} \exp \left( - \sum_{i=1}^{m} a_{i}(\theta_{0}) t_{i} + \psi(\theta_{0}) \right) \ P_{\theta}^{T}(dt) .\]

$\lambda$ is $\sigma$-finite measure. Indeed,

\[\begin{eqnarray} \lambda(B) & = & \int_{B} \exp \left( - \sum_{i=1}^{m} a_{i}(\theta_{0}) t_{i} + \psi(\theta_{0}) \right) \ P_{\theta}^{T}(dt) \nonumber \\ & = & \int_{T^{-1}(B)} \exp \left( - \sum_{i=1}^{m} a_{i}(\theta_{0}) T_{i}(x) + \psi(\theta_{0}) \right) \ P_{\theta}(dx) \nonumber \\ & = & \int_{T^{-1}(B)} \frac{ 1 }{ \frac{ d P_{\theta} }{ d \mu }(x) } \ P_{\theta}(dx) \nonumber \\ & = & \int_{T^{-1}(B)} \ \mu(dx) \nonumber \\ & = & \mu(T^{-1}(B)) . \nonumber \end{eqnarray}\]

$\mu$ is $\sigma$-finite so that there exists $\{A_{i}\}_{i \in \mathbb{N}} \subset \mathcal{A}$ such that $\mu(A_{i}) < \infty$ and $\cup_{i \in \mathbb{N}} A_{i} = \mathcal{X}$. Let $B_{0} := \mathbb{R}^{m} \setminus T(\mathcal{X})$ and $B_{i} := T(A_{i}) \ (i \in \mathbb{N})$. Then

\[\begin{eqnarray} B_{0} \cup \left( \bigcup_{i \in \mathbb{N}} B_{i} \right) & = & B_{0} \cup T \left( \bigcup_{i \in \mathbb{N}} A_{i} \right) \nonumber \\ & = & B_{0} \cup T(\mathcal{X}) \nonumber \\ & = & \mathbb{R}^{m} \nonumber \end{eqnarray}\]

Moreover, by the definition of $B_{i}$, we observe that

\[\begin{eqnarray} \forall i \in \mathbb{N}, \ \lambda(B_{i}) = \mu(A_{i}) < \infty . \nonumber \end{eqnarray}\]

Similary, we have

\[\begin{eqnarray} \lambda(B_{0}) & = & \lambda(\mathbb{R}^{m} \setminus T(\mathcal{X})) \nonumber \\ & = & 0 . \nonumber \end{eqnarray}\]

Therefore, $\lambda$ is $\sigma$-finite. For $\theta_{0} \in \Theta$,

\[\begin{eqnarray} B \in \mathcal{B}(\mathbb{R}^{m}), \ P_{\theta_{0}}^{T}(B) & = & P_{\theta_{0}}^{T}(T \in B) \nonumber \\ & = & \int_{\mathcal{X}} 1_{B}(T(x)) \exp \left( \sum_{j=1}^{m} ( a_{i}(\theta_{0}) - a_{i}(\theta) )T_{i}(x) - ( \psi(\theta_{0}) - \psi(\theta) ) \right) \ P_{\theta}(d x) \quad (\because P_{\theta_{0}} \ll P_{\theta}) \nonumber \\ & = & \int_{\mathbb{R}^{m}} 1_{B}(t) \exp \left( \sum_{j=1}^{m} ( a_{i}(\theta_{0}) - a_{i}(\theta) )t_{i} - ( \psi(\theta_{0}) - \psi(\theta) ) \right) \ P_{\theta}^{T}(d t) \nonumber \\ & = & \int_{\mathbb{R}^{m}} 1_{B}(t) \exp \left( \sum_{j=1}^{m} a_{i}(\theta_{0}) t_{i} - \psi(\theta_{0}) \right) \ \lambda(d t) \nonumber \end{eqnarray}\]

Then $\eqref{chap03_theorem_03_15_density}$ holds.

$\Box$

Natural parameter space.

Theorem 3.16

Natural parameter space is convex space.

proof

$\Box$

Proposition 3.17

$m \in \mathbb{N}$,
$\mu$,
- measure over $(\mathcal{X}, \mathcal{A})$
$\phi: \mathcal{X} \rightarrow \mathbb{R}$,
- measurable function
$T_{k}: \mathcal{X} \rightarrow \mathbb{R} \ (k=1, \ldots, m)$,
- measurable function

Suppose that there exists open subset $V \subseteq \mathbb{R}^{m}$ such that

\[a \in V, \ f(a) := \int_{\mathcal{X}} \phi(x) \exp \left( \sum_{k=1}^{m} a_{k}T_{k}(x) \right) \ \mu(dx) .\]

Then $f$ defined above has a analytic continuation to $D$ where

\[D := \{ z := (z_{1}, \ldots, z_{m}) \in \mathbb{C}^{m} \mid \mathrm{Re}(z) \in V \}\]

and is $m$ variables holomorphic function. Moreover,

\[\mathbf{n} \in \mathbb{Z}_{\ge 0}^{m}, \ z \in D, \ \partial_{z}^{\mathbf{n}} f(z) = \int_{\mathcal{X}} \phi(x) T(x)^{\mathbf{n}} \exp \left( \sum_{k=1}^{m} z_{k}T_{k}(x) \right) \ \mu(dx)\]

where $T := (T_{1}, \ldots, T_{m})$.

proof

From Hartogs’ theorem, we only need to show $f$ is holomophic for each element. Let $z^{*} \in D$, $\mathbf{n} \in \mathbb{Z}_{\ge 0}^{m}$ and $\epsilon > 0$. We put $a_{k} := \mathrm{Re}(z_{k})$, $a_{k}^{*} := \mathrm{Re}(z_{k}^{*})$. There exists a constant $C_{\mathbf{n}}$ such that

\[t := (t_{1}, \ldots, t_{m}) \in \mathbb{R}^{m}, \ \sup \{ |t^{\mathbf{n}}e^{z \cdot t}| \mid z \in \prod_{k=1}^{m} \{ z_{k} \mid a_{k} - a_{k}^{*} < \epsilon / 2 \} \} \le C_{\mathbf{n}} \prod_{k=1}^{m} \left( e^{(a_{k}^{*} - \epsilon) t_{k}} + e^{(a_{k}^{*} + \epsilon) t_{k}} \right)\]

$\Box$

Theorem 3.18 Completeness of exponential family

$\mathcal{P}$,
$A := \{(a_{1}(\theta), \ldots, a_{m}(\theta) \in \mathbb{R}^{m} \mid \theta \in \Theta \}$,
$A^{\mathrm{int}} \neq \emptyset$,
- interior of $A$ is not empty

Then statistics $T := (T_{1}, \ldots, T_{m})$ is complete.

proof

It is enough to show that for all $h: \mathbb{R}^{m} \rightarrow \mathbb{R}$ measurable,

\[\forall \theta \in \Theta, \ \int_{\mathbb{R}^{m}} h(t) \ P_{\theta}^{T}(d t) = 0 \ \Rightarrow \ \forall \theta \in \Theta, \ h = 0 \ P_{\theta}^{T} \text{-a.s.}\]

Let $h^{+}, h^{-}$ are positive and negative part of $h$ (i.e. $h = h^{+} - h^{-}$). By theorem 3.15, we have

\[\forall \theta \in \Theta, \ \int_{\mathbb{R}^{m}} h^{+}(t) \exp \left( \sum_{i=1}^{m} a_{i}(\theta) t_{k} \right) \ \lambda(dt) = \int_{\mathbb{R}^{m}} h^{-}(t) \exp \left( \sum_{i=1}^{m} a_{i}(\theta) t_{k} \right) \ \lambda(dt) .\]

By our assumption, there exists $a \in A^{\mathrm{int}}$, $\epsilon > 0$ such that for all $a^{*} \in (\epsilon, \epsilon)^{m}$,

\[\begin{eqnarray} & & \int_{\mathbb{R}^{m}} h^{+}(t) \exp \left( \langle a + a^{*}, t \rangle \right) \ \lambda(dt) = \int_{\mathbb{R}^{m}} h^{-}(t) \exp \left( \langle a + a^{*}, t \rangle \right) \ \lambda(dt) \nonumber \\ & \Leftrightarrow & \int_{\mathbb{R}^{m}} e^{\langle a, t \rangle} h^{+}(t) e^{\langle a^{*}, t \rangle} \ \lambda(dt) = \int_{\mathbb{R}^{m}} e^{\langle a, t \rangle} h^{-}(t) e^{\langle a^{*}, t \rangle} \ \lambda(dt) \nonumber \\ & \Leftrightarrow & \int_{\mathbb{R}^{m}} e^{\langle a, t \rangle} \ \lambda_{+}(dt) = \int_{\mathbb{R}^{m}} e^{\langle a, t \rangle} \ \lambda_{+}(dt) \label{chap03_theorem_03_18_before_extension} \end{eqnarray}\]

where

\[\lambda^{\pm}(dt) := h^{\pm}(t) e^{\langle a^{*}, t \rangle} \lambda(dt) .\]

For clearity, let

\[\begin{eqnarray} f_{\pm}(a) & := & \int_{\mathbb{R}^{m}} e^{\langle a, t \rangle} \ \lambda_{\pm}(dt) \nonumber \\ D & := & \{ a \in \mathbb{R}^{m} \mid a \in (\epsilon, \epsilon)^{m} \} \nonumber \end{eqnarray}\]

The domain of the both functions is $D$. Now we extend the domain to $D^{\prime}$ where

\[D^{\prime} := \{ a + \sqrt{-1} s \in \mathbb{C}^{m} \mid a \in D, \ s \in \mathbb{R}^{m} \} .\]

Since $D \subseteq D^{\prime}$ has accumlation points (e.g. $(0, \ldots, 0)$), by applying the identity theorem element by element, the extensions are identical, that is,

\[\begin{eqnarray} \forall a \in D^{\prime}, \ f_{+}(a) = f_{-}(a) . \end{eqnarray}\]

In particular, we take $a := \sqrt{-1}s$,

\[\begin{eqnarray} & & \bar{\lambda}_{+} \equiv \bar{\lambda}_{-} \nonumber \\ & \Rightarrow & \forall A \in \mathcal{B}(\mathbb{R}^{m}), \ \int_{A} h^{+}(t) e^{\langle a^{*}, t \rangle} \ \lambda(dt) = \int_{A} h^{-}(t) e^{\langle a^{*}, t \rangle} \ \lambda(dt) \nonumber \end{eqnarray}\]

This implies $h^{+} = h^{-} \lambda \text{-a.e.}$. Indeed, we will show $\lambda(\{t \mid h^{+}(t) > h^{-}(t)\} \cup \{t \mid h^{+}(t) < h^{-}(t)\}) = 0$. For simplicity, let $S_{>} := \{t \mid h^{+}(t) > h^{-}(t)\}$ and $S_{<} := \{t \mid h^{+}(t) < h^{-}(t)\}$. By our assumption, we have

\[\begin{eqnarray} \forall A \in \mathcal{B}(\mathbb{R}^{m}), \ & & \int_{A} h^{+}(t) e^{\langle a^{*}, t \rangle} \ \lambda(dt) = \int_{A} h^{-}(t) e^{\langle a^{*}, t \rangle} \ \lambda(dt) \nonumber \\ & \Leftrightarrow & \int_{A} (h^{+}(t) - h^{-}(t)) e^{\langle a^{*}, t \rangle} \ \lambda(dt) = 0 \nonumber \end{eqnarray}\]

Then substituing $A$ to $S_{>}$, we obtain

\[\int_{\mathbb{R}^{m}} 1_{S_{>}}(t) (h^{+}(t) - h^{-}(t)) e^{\langle a^{*}, t \rangle} \ \lambda(dt) = 0 .\]

Hence we obtain $1_{S_{>}}(\cdot) (h^{+}(\cdot) - h^{-}(\cdot)) e^{\langle a^{*}, \cdot \rangle} = 0 \ \lambda \text{-a.e.}$.

\[\begin{eqnarray} & & \lambda( \{ t \in \mathbb{R}^{m} \mid 1_{S_{>}}(t) (h^{+}(t) - h^{-}(t)) e^{\langle a^{*}, t \rangle} > 0 \} ) = 0 \nonumber \\ & \Rightarrow & \lambda( \{ t \in \mathbb{R}^{m} \mid 1_{S_{>}}(t) (h^{+}(t) - h^{-}(t)) e^{\langle a^{*}, t \rangle} > 0 \} \cap S_{>} ) = 0 \quad (\because \text{monotonicity of measure}) \nonumber \\ & \Leftrightarrow & \lambda( \{ t \in \mathbb{R}^{m} \mid 1_{S_{>}}(t) > 0 \} \cap S_{>} ) = 0 \quad (\because \text{the sets are equivalent}) \nonumber \\ & \Leftrightarrow & \lambda( \{ t \in \mathbb{R}^{m} \mid 1_{S_{>}}(t) > 0 \} ) = 0 \quad (\because \text{the sets are equivalent}) \end{eqnarray}\]

Therefore, $\lambda(S_{>}) = 0$. Similary, we can obtain $\lambda(S_{<}) = 0$. Since we know that countable union of null sets is still null set, $\lambda(S_{>} \cup S_{<}) = 0$. Finally, $h = 0 \ \lambda \text{-a.e.}$

$\Box$

Proposition 3.19

$m_{1}, m_{2} \in \mathbb{N}$,
$m := m_{1} + m_{2}$,
$A := \{a_{1:m}\} \subseteq \mathbb{R}^{m}$,
$a_{1:m} := (a_{1:m_{1}}, a_{m_{1} + 1: m}) \in A$,
- $a_{1:m_{1}} := (a_{1}, \ldots, a_{m})$,
- $a_{m_{1} + 1:m} := (a_{m_{1} + 1}, \ldots, a_{m})$,
$T_{1:m} := (T_{1:m_{1}}, T_{m_{1} + 1: m})$,
- $T_{1:m_{1}} := (T_{1}, \ldots, T_{m_{1}})$,
- $T_{m_{1}+1:m} := (T_{m_{1}+1}, \ldots, T_{m})$,
$t_{1:m} := (t_{1:m_{1}}, t_{m_{1} + 1: m}) \in \mathbb{R}^{m}$,
- $t_{1:m_{1}} := (t_{1}, \ldots, t_{m_{1}})$,
- $t_{m_{1} + 1:m} := (t_{m_{1} + 1}, \ldots, t_{m})$,

\[\begin{eqnarray} \frac{ d P_{a_{1:m}} }{ d \mu }(x) & = & \exp \left( \sum_{j=1}^{m} a_{j}T_{j} - \psi(a_{1:m}) \right) g(x) \nonumber \\ & = & \exp \left( \langle a_{1:m_{1}}, T_{1:m_{1}} \rangle + \langle a_{m_{1}+1:m}, T_{m_{1}+1:m} \rangle - \psi(a_{1:m}) \right) g(x) \nonumber \end{eqnarray}\]

Then for all $t_{m_{1} + 1:m} \in \mathbb{R}^{m_{2}}$, there exists $\sigma$-finite measure $\nu_{t_{m_{1}+1}:m}$ over $(\mathbb{R}^{m_{1}}, \mathcal{B}(\mathbb{R}^{m_{1}})$ and probability measure $\nu_{0}$ over $(\mathbb{R}^{m_{1}}, \mathcal{B}(\mathbb{R}^{m_{1}}))$ such that

(1)

\[\begin{eqnarray} \forall a_{1:m} \in A, \ N_{a_{1:m}} & := & \left\{ t_{m_{1}+1:m} \in \mathbb{R}^{m_{2}} \mid \int_{\mathbb{R}^{m_{1}}} \exp \left( \langle a_{1:m_{1}}, t_{1:m_{1}} \rangle \right) \ \nu_{t_{m_{1}+1:m}}(d t_{1:m_{1}}) = 0 \text{ or } \infty \right\} \nonumber \\ P_{a_{1:m}}^{T_{m_{1}+1:m}}(N_{a_{1:m}}) & = & 0 \nonumber \end{eqnarray}\]

(2)

\[\begin{eqnarray} P_{a_{1:m}}(d t_{1:m_{1}} \mid t_{m_{1}+1:m}) := \begin{cases} \frac{ \displaystyle \exp \left( \langle a_{1:m_{1}}, t_{1:m_{1}} \rangle \right) \nu_{t_{m_{1}+1:m}}(dt_{1:m_{1}}) }{ \displaystyle \int_{\mathbb{R}^{m_{1}}} \exp \left( \langle a_{1:m_{1}}, \tau_{1:m_{1}} \rangle \right) \nu_{t_{m_{1}+1:m}}(d \tau_{1:m_{1}}) } & (t_{m_{1}+1:m} \in N_{a_{1:m_{1}}}^{c}) \\ \nu_{0} & (t_{m_{1}+1:m} \in N_{a_{1:m_{1}}}) \end{cases} \end{eqnarray}\]

is probability distribution over $(\mathbb{R}^{m_{1}}, \mathcal{B}(\mathbb{R}^{m_{1}}))$ and regular probability distribution of $P_{a_{1:m}}$ of $T_{1:m_{1}}$ given $T_{m_{1}+1:m} = t_{m_{1}+1:m}$.

proof

$\Box$