3.3 Exponential family
Definition 3.13 Exponential families
- $(\mathcal{X}, \mathcal{A})$,
- measurable space
- $\Theta$,
- parameter space
- \(\mathcal{P} := \{P_{\theta}\}_{\theta \in \Theta}\),
- probability measure over sample sp. \((\mathcal{X}, \mathcal{A})\)
\(\mathcal{P}\) is said to be exponential families if there exist
- $\mu: \mathcal{X} \rightarrow \mathrm{Range}(\mu) \subseteq \mathbb{R}$,
- $\sigma$-finite measure $\mu$ over \((\mathcal{Y}, \mathcal{B})\),
- $ \forall \theta \in \Theta$, \(P_{\theta} \ll \mu\),
- $k \in \mathbb{N}$,
- \(T_{i}: \mathcal{X} \rightarrow \mathcal{Y} \ (i = 1, \ldots, k)\),
- measurable function
- \(a_{i}: \Theta \rightarrow \mathcal{Y} \ (i = 1, \ldots, k)\),
- real valued function
- $\psi: \Theta \rightarrow \mathbb{R}$,
- measurable function
- $g: \mathcal{X} \rightarrow \mathcal{Y}$,
- measurable function
such that
\[\begin{eqnarray} \frac{ d P_{\theta} }{ d \mu }(x) & = & \exp \left( \sum_{j=1}^{k} a_{j}(\theta) T_{j}(x) - \psi(\theta) \right) g(x) \label{chap03_03_12_definition_of_exponential_family} \end{eqnarray}\]Example 3.12
Example 3.13
Theorem 3.14
- $(\mathcal{X}, \mathcal{A})$,
- measurable space
- \(\mathcal{P} := \{ P_{\theta}\}_{\theta \in \Theta}\),
- probability measure over $(\mathcal{X}, \mathcal{A})$,
- $k \in \mathbb{N}$,
- $(\mathcal{X}^{(i)}, \mathcal{A}^{(i)})$,
- measurable space for $i = 1, \ldots, k$
- \(\mathcal{P}^{(i)} := \{P_{\theta}^{(i)}\}_{\theta \in \Theta}\),
- probability measure over $(\mathcal{X}^{(i)}, \mathcal{A}^{(i)})$ for $i = 1, \ldots, k$
The following statements hold:
- (a) If $\mathcal{P}$ is an exponential family, $\forall P_{\theta}, P_{\theta^{\prime}} \in \mathcal{P}$, $P_{\theta} \ll P_{\theta^{\prime}}$, $P_{\theta} \ll P_{\theta^{\prime}}$,
- (b) If for all $i = 1, \ldots, n$, $\mathcal{P}^{(i)}$ is exponential family, then \(\{\prod_{i=1}^{n} P_{\theta}^{(i)}\}\) is an exponential family over $( \prod_{i=1}^{n} \mathcal{X}^{(i)}, \prod_{i=1}^{n} \mathcal{A}^{(i)})$.
- (c) $T$ is sufficinet with respect to $\mathcal{P}$.
proof
(a)
Let $A \in \mathcal{A}$ be fixed.
\[P_{\theta}(A) = \int_{A} \exp \left( \sum_{i=1}^{m} a_{i}(\theta) T_{i}(x) - \psi(\theta) \right) \ \mu(dx)\] \[\begin{eqnarray} \frac{ d P_{\theta} }{ d P_{\theta^{\prime}} } & = & \frac{ d P_{\theta} }{ d\mu } \frac{ 1 }{ \frac{ d P_{\theta^{\prime}} }{ d\mu } } \\ & = & \exp \left( \sum_{j=1}^{m} (a_{j}(\theta) - a_{j}(\theta^{\prime})) T_{j}(x) \right) \nonumber \end{eqnarray}\] \[P_{\theta}(A) = \int_{A} \exp \left( \sum_{j=1}^{m} (a_{j}(\theta) - a_{j}(\theta^{\prime})) T_{j}(x) \right) \ P_{\theta^{\prime}}(dx)\]The integrand is positive so that $P_{\theta^{\prime}}(A) = 0$.
(b) Let $\mathcal{C}$ be a cylinder set. That is,
\[\mathcal{C} := \{ A_{1} \times \cdots \times A_{n} \mid A_{i} \in \mathcal{A}^{(i)} \} .\]Then
\[\begin{eqnarray} \forall A_{1} \times \cdots A_{n} \in \mathcal{C}, \ \left( \prod_{i=1}^{n} P_{\theta}^{(i)} \right) (A_{1} \times \cdots A_{n}) & = & \prod_{i=1}^{n} P_{\theta}^{(i)}(A_{i}) \ (\because \text{ by definition of product measure}) \\ & = & \prod_{i=1}^{n} \left( \int_{A_{i}} \frac{ d P_{\theta}^{(i)} }{ d \mu }(x_{i}) \ \mu(d x_{i}) \right) \\ & = & \int_{A_{1}\times \cdots A_{n}} \frac{ d P_{\theta}^{(1)} }{ d \mu }(x_{1}) \times \cdots \times \frac{ d P_{\theta}^{(n)} }{ d \mu }(x_{n}) \ \mu(d x) . \nonumber \end{eqnarray}\]Therefore,
\[\frac{ d \left( \prod_{i=1}^{n} P_{\theta}^{(i)} \right) }{ d \mu } = \prod_{i=1}^{n} \left( \frac{ d P_{\theta}^{(i)} }{ d \mu } \right) .\]Obviously, the RHS of the equation is an exponential family.
(c)
Immediate consequence of factorization theorem (Theorem 3.7).
Theorem 3.15
- $(\mathcal{X}, \mathcal{A})$,
- measurable space
- \(\mathcal{P} := \{ P_{\theta}\}_{\theta \in \Theta}\),
- probability measure over $(\mathcal{X}, \mathcal{A})$,
- defined in \(\eqref{chap03_03_12_definition_of_exponential_family}\).
- $T := (T_{1}, \ldots, T_{m})$,
Then \(\{P_{\theta}^{T}\}_{\theta \in \Theta}\) is an exponential familiy. That is, there exist $\sigma$-finite measure $\lambda$ over $(\mathbb{R}^{m}, \mathcal{B}(\mathbb{R}^{m}))$ such that
\[\begin{eqnarray} P_{\theta} & \ll & \lambda \\ \frac{ d P_{\theta}^{T} }{ d \lambda } (t) & = & \exp \left( \sum_{i=1}^{m} a_{i}(\theta) t_{i} - \psi(\theta) \right) \ \lambda \text{-a.e.} \label{chap03_theorem_03_15_density} \end{eqnarray}\]proof
Let $\theta \in \Theta$ be fixed and
\[B \in \mathcal{B}(\mathbb{R}^{m}), \ \lambda(B) := \int_{B} \exp \left( - \sum_{i=1}^{m} a_{i}(\theta_{0}) t_{i} + \psi(\theta_{0}) \right) \ P_{\theta}^{T}(dt) .\]$\lambda$ is $\sigma$-finite measure. Indeed,
\[\begin{eqnarray} \lambda(B) & = & \int_{B} \exp \left( - \sum_{i=1}^{m} a_{i}(\theta_{0}) t_{i} + \psi(\theta_{0}) \right) \ P_{\theta}^{T}(dt) \nonumber \\ & = & \int_{T^{-1}(B)} \exp \left( - \sum_{i=1}^{m} a_{i}(\theta_{0}) T_{i}(x) + \psi(\theta_{0}) \right) \ P_{\theta}(dx) \nonumber \\ & = & \int_{T^{-1}(B)} \frac{ 1 }{ \frac{ d P_{\theta} }{ d \mu }(x) } \ P_{\theta}(dx) \nonumber \\ & = & \int_{T^{-1}(B)} \ \mu(dx) \nonumber \\ & = & \mu(T^{-1}(B)) . \nonumber \end{eqnarray}\]$\mu$ is $\sigma$-finite so that there exists \(\{A_{i}\}_{i \in \mathbb{N}} \subset \mathcal{A}\) such that $\mu(A_{i}) < \infty$ and $\cup_{i \in \mathbb{N}} A_{i} = \mathcal{X}$. Let $B_{0} := \mathbb{R}^{m} \setminus T(\mathcal{X})$ and $B_{i} := T(A_{i}) \ (i \in \mathbb{N})$. Then
\[\begin{eqnarray} B_{0} \cup \left( \bigcup_{i \in \mathbb{N}} B_{i} \right) & = & B_{0} \cup T \left( \bigcup_{i \in \mathbb{N}} A_{i} \right) \nonumber \\ & = & B_{0} \cup T(\mathcal{X}) \nonumber \\ & = & \mathbb{R}^{m} \nonumber \end{eqnarray}\]Moreover, by the definition of $B_{i}$, we observe that
\[\begin{eqnarray} \forall i \in \mathbb{N}, \ \lambda(B_{i}) = \mu(A_{i}) < \infty . \nonumber \end{eqnarray}\]Similary, we have
\[\begin{eqnarray} \lambda(B_{0}) & = & \lambda(\mathbb{R}^{m} \setminus T(\mathcal{X})) \nonumber \\ & = & 0 . \nonumber \end{eqnarray}\]Therefore, $\lambda$ is $\sigma$-finite. For $\theta_{0} \in \Theta$,
\[\begin{eqnarray} B \in \mathcal{B}(\mathbb{R}^{m}), \ P_{\theta_{0}}^{T}(B) & = & P_{\theta_{0}}^{T}(T \in B) \nonumber \\ & = & \int_{\mathcal{X}} 1_{B}(T(x)) \exp \left( \sum_{j=1}^{m} ( a_{i}(\theta_{0}) - a_{i}(\theta) )T_{i}(x) - ( \psi(\theta_{0}) - \psi(\theta) ) \right) \ P_{\theta}(d x) \quad (\because P_{\theta_{0}} \ll P_{\theta}) \nonumber \\ & = & \int_{\mathbb{R}^{m}} 1_{B}(t) \exp \left( \sum_{j=1}^{m} ( a_{i}(\theta_{0}) - a_{i}(\theta) )t_{i} - ( \psi(\theta_{0}) - \psi(\theta) ) \right) \ P_{\theta}^{T}(d t) \nonumber \\ & = & \int_{\mathbb{R}^{m}} 1_{B}(t) \exp \left( \sum_{j=1}^{m} a_{i}(\theta_{0}) t_{i} - \psi(\theta_{0}) \right) \ \lambda(d t) \nonumber \end{eqnarray}\]Then \(\eqref{chap03_theorem_03_15_density}\) holds.
Natural parameter space.
Theorem 3.16
Natural parameter space is convex space.
proof
Proposition 3.17
- $m \in \mathbb{N}$,
- $\mu$,
- measure over $(\mathcal{X}, \mathcal{A})$
- $\phi: \mathcal{X} \rightarrow \mathbb{R}$,
- measurable function
- $T_{k}: \mathcal{X} \rightarrow \mathbb{R} \ (k=1, \ldots, m)$,
- measurable function
Suppose that there exists open subset $V \subseteq \mathbb{R}^{m}$ such that
\[a \in V, \ f(a) := \int_{\mathcal{X}} \phi(x) \exp \left( \sum_{k=1}^{m} a_{k}T_{k}(x) \right) \ \mu(dx) .\]Then $f$ defined above has a analytic continuation to $D$ where
\[D := \{ z := (z_{1}, \ldots, z_{m}) \in \mathbb{C}^{m} \mid \mathrm{Re}(z) \in V \}\]and is $m$ variables holomorphic function. Moreover,
\[\mathbf{n} \in \mathbb{Z}_{\ge 0}^{m}, \ z \in D, \ \partial_{z}^{\mathbf{n}} f(z) = \int_{\mathcal{X}} \phi(x) T(x)^{\mathbf{n}} \exp \left( \sum_{k=1}^{m} z_{k}T_{k}(x) \right) \ \mu(dx)\]where $T := (T_{1}, \ldots, T_{m})$.
proof
From Hartogs’ theorem, we only need to show $f$ is holomophic for each element. Let \(z^{*} \in D\), \(\mathbf{n} \in \mathbb{Z}_{\ge 0}^{m}\) and $\epsilon > 0$. We put $a_{k} := \mathrm{Re}(z_{k})$, \(a_{k}^{*} := \mathrm{Re}(z_{k}^{*})\). There exists a constant $C_{\mathbf{n}}$ such that
\[t := (t_{1}, \ldots, t_{m}) \in \mathbb{R}^{m}, \ \sup \{ |t^{\mathbf{n}}e^{z \cdot t}| \mid z \in \prod_{k=1}^{m} \{ z_{k} \mid a_{k} - a_{k}^{*} < \epsilon / 2 \} \} \le C_{\mathbf{n}} \prod_{k=1}^{m} \left( e^{(a_{k}^{*} - \epsilon) t_{k}} + e^{(a_{k}^{*} + \epsilon) t_{k}} \right)\]Theorem 3.18 Completeness of exponential family
- $\mathcal{P}$,
- \(A := \{(a_{1}(\theta), \ldots, a_{m}(\theta) \in \mathbb{R}^{m} \mid \theta \in \Theta \}\),
- $A^{\mathrm{int}} \neq \emptyset$,
- interior of $A$ is not empty
Then statistics $T := (T_{1}, \ldots, T_{m})$ is complete.
proof
It is enough to show that for all $h: \mathbb{R}^{m} \rightarrow \mathbb{R}$ measurable,
\[\forall \theta \in \Theta, \ \int_{\mathbb{R}^{m}} h(t) \ P_{\theta}^{T}(d t) = 0 \ \Rightarrow \ \forall \theta \in \Theta, \ h = 0 \ P_{\theta}^{T} \text{-a.s.}\]Let $h^{+}, h^{-}$ are positive and negative part of $h$ (i.e. $h = h^{+} - h^{-}$). By theorem 3.15, we have
\[\forall \theta \in \Theta, \ \int_{\mathbb{R}^{m}} h^{+}(t) \exp \left( \sum_{i=1}^{m} a_{i}(\theta) t_{k} \right) \ \lambda(dt) = \int_{\mathbb{R}^{m}} h^{-}(t) \exp \left( \sum_{i=1}^{m} a_{i}(\theta) t_{k} \right) \ \lambda(dt) .\]By our assumption, there exists $a \in A^{\mathrm{int}}$, $\epsilon > 0$ such that for all $a^{*} \in (\epsilon, \epsilon)^{m}$,
\[\begin{eqnarray} & & \int_{\mathbb{R}^{m}} h^{+}(t) \exp \left( \langle a + a^{*}, t \rangle \right) \ \lambda(dt) = \int_{\mathbb{R}^{m}} h^{-}(t) \exp \left( \langle a + a^{*}, t \rangle \right) \ \lambda(dt) \nonumber \\ & \Leftrightarrow & \int_{\mathbb{R}^{m}} e^{\langle a, t \rangle} h^{+}(t) e^{\langle a^{*}, t \rangle} \ \lambda(dt) = \int_{\mathbb{R}^{m}} e^{\langle a, t \rangle} h^{-}(t) e^{\langle a^{*}, t \rangle} \ \lambda(dt) \nonumber \\ & \Leftrightarrow & \int_{\mathbb{R}^{m}} e^{\langle a, t \rangle} \ \lambda_{+}(dt) = \int_{\mathbb{R}^{m}} e^{\langle a, t \rangle} \ \lambda_{+}(dt) \label{chap03_theorem_03_18_before_extension} \end{eqnarray}\]where
\[\lambda^{\pm}(dt) := h^{\pm}(t) e^{\langle a^{*}, t \rangle} \lambda(dt) .\]For clearity, let
\[\begin{eqnarray} f_{\pm}(a) & := & \int_{\mathbb{R}^{m}} e^{\langle a, t \rangle} \ \lambda_{\pm}(dt) \nonumber \\ D & := & \{ a \in \mathbb{R}^{m} \mid a \in (\epsilon, \epsilon)^{m} \} \nonumber \end{eqnarray}\]The domain of the both functions is $D$. Now we extend the domain to $D^{\prime}$ where
\[D^{\prime} := \{ a + \sqrt{-1} s \in \mathbb{C}^{m} \mid a \in D, \ s \in \mathbb{R}^{m} \} .\]Since $D \subseteq D^{\prime}$ has accumlation points (e.g. $(0, \ldots, 0)$), by applying the identity theorem element by element, the extensions are identical, that is,
\[\begin{eqnarray} \forall a \in D^{\prime}, \ f_{+}(a) = f_{-}(a) . \end{eqnarray}\]In particular, we take $a := \sqrt{-1}s$,
\[\begin{eqnarray} & & \bar{\lambda}_{+} \equiv \bar{\lambda}_{-} \nonumber \\ & \Rightarrow & \forall A \in \mathcal{B}(\mathbb{R}^{m}), \ \int_{A} h^{+}(t) e^{\langle a^{*}, t \rangle} \ \lambda(dt) = \int_{A} h^{-}(t) e^{\langle a^{*}, t \rangle} \ \lambda(dt) \nonumber \end{eqnarray}\]This implies $h^{+} = h^{-} \lambda \text{-a.e.}$. Indeed, we will show \(\lambda(\{t \mid h^{+}(t) > h^{-}(t)\} \cup \{t \mid h^{+}(t) < h^{-}(t)\}) = 0\). For simplicity, let \(S_{>} := \{t \mid h^{+}(t) > h^{-}(t)\}\) and \(S_{<} := \{t \mid h^{+}(t) < h^{-}(t)\}\). By our assumption, we have
\[\begin{eqnarray} \forall A \in \mathcal{B}(\mathbb{R}^{m}), \ & & \int_{A} h^{+}(t) e^{\langle a^{*}, t \rangle} \ \lambda(dt) = \int_{A} h^{-}(t) e^{\langle a^{*}, t \rangle} \ \lambda(dt) \nonumber \\ & \Leftrightarrow & \int_{A} (h^{+}(t) - h^{-}(t)) e^{\langle a^{*}, t \rangle} \ \lambda(dt) = 0 \nonumber \end{eqnarray}\]Then substituing $A$ to $S_{>}$, we obtain
\[\int_{\mathbb{R}^{m}} 1_{S_{>}}(t) (h^{+}(t) - h^{-}(t)) e^{\langle a^{*}, t \rangle} \ \lambda(dt) = 0 .\]Hence we obtain \(1_{S_{>}}(\cdot) (h^{+}(\cdot) - h^{-}(\cdot)) e^{\langle a^{*}, \cdot \rangle} = 0 \ \lambda \text{-a.e.}\).
\[\begin{eqnarray} & & \lambda( \{ t \in \mathbb{R}^{m} \mid 1_{S_{>}}(t) (h^{+}(t) - h^{-}(t)) e^{\langle a^{*}, t \rangle} > 0 \} ) = 0 \nonumber \\ & \Rightarrow & \lambda( \{ t \in \mathbb{R}^{m} \mid 1_{S_{>}}(t) (h^{+}(t) - h^{-}(t)) e^{\langle a^{*}, t \rangle} > 0 \} \cap S_{>} ) = 0 \quad (\because \text{monotonicity of measure}) \nonumber \\ & \Leftrightarrow & \lambda( \{ t \in \mathbb{R}^{m} \mid 1_{S_{>}}(t) > 0 \} \cap S_{>} ) = 0 \quad (\because \text{the sets are equivalent}) \nonumber \\ & \Leftrightarrow & \lambda( \{ t \in \mathbb{R}^{m} \mid 1_{S_{>}}(t) > 0 \} ) = 0 \quad (\because \text{the sets are equivalent}) \end{eqnarray}\]Therefore, $\lambda(S_{>}) = 0$. Similary, we can obtain $\lambda(S_{<}) = 0$. Since we know that countable union of null sets is still null set, $\lambda(S_{>} \cup S_{<}) = 0$. Finally, $h = 0 \ \lambda \text{-a.e.}$
Proposition 3.19
- $m_{1}, m_{2} \in \mathbb{N}$,
- $m := m_{1} + m_{2}$,
- \(A := \{a_{1:m}\} \subseteq \mathbb{R}^{m}\),
- $a_{1:m} := (a_{1:m_{1}}, a_{m_{1} + 1: m}) \in A$,
- $a_{1:m_{1}} := (a_{1}, \ldots, a_{m})$,
- $a_{m_{1} + 1:m} := (a_{m_{1} + 1}, \ldots, a_{m})$,
- $T_{1:m} := (T_{1:m_{1}}, T_{m_{1} + 1: m})$,
- $T_{1:m_{1}} := (T_{1}, \ldots, T_{m_{1}})$,
- $T_{m_{1}+1:m} := (T_{m_{1}+1}, \ldots, T_{m})$,
- $t_{1:m} := (t_{1:m_{1}}, t_{m_{1} + 1: m}) \in \mathbb{R}^{m}$,
- $t_{1:m_{1}} := (t_{1}, \ldots, t_{m_{1}})$,
- $t_{m_{1} + 1:m} := (t_{m_{1} + 1}, \ldots, t_{m})$,
Then for all $t_{m_{1} + 1:m} \in \mathbb{R}^{m_{2}}$, there exists $\sigma$-finite measure \(\nu_{t_{m_{1}+1}:m}\) over $(\mathbb{R}^{m_{1}}, \mathcal{B}(\mathbb{R}^{m_{1}})$ and probability measure $\nu_{0}$ over $(\mathbb{R}^{m_{1}}, \mathcal{B}(\mathbb{R}^{m_{1}}))$ such that
(1)
\[\begin{eqnarray} \forall a_{1:m} \in A, \ N_{a_{1:m}} & := & \left\{ t_{m_{1}+1:m} \in \mathbb{R}^{m_{2}} \mid \int_{\mathbb{R}^{m_{1}}} \exp \left( \langle a_{1:m_{1}}, t_{1:m_{1}} \rangle \right) \ \nu_{t_{m_{1}+1:m}}(d t_{1:m_{1}}) = 0 \text{ or } \infty \right\} \nonumber \\ P_{a_{1:m}}^{T_{m_{1}+1:m}}(N_{a_{1:m}}) & = & 0 \nonumber \end{eqnarray}\](2)
\[\begin{eqnarray} P_{a_{1:m}}(d t_{1:m_{1}} \mid t_{m_{1}+1:m}) := \begin{cases} \frac{ \displaystyle \exp \left( \langle a_{1:m_{1}}, t_{1:m_{1}} \rangle \right) \nu_{t_{m_{1}+1:m}}(dt_{1:m_{1}}) }{ \displaystyle \int_{\mathbb{R}^{m_{1}}} \exp \left( \langle a_{1:m_{1}}, \tau_{1:m_{1}} \rangle \right) \nu_{t_{m_{1}+1:m}}(d \tau_{1:m_{1}}) } & (t_{m_{1}+1:m} \in N_{a_{1:m_{1}}}^{c}) \\ \nu_{0} & (t_{m_{1}+1:m} \in N_{a_{1:m_{1}}}) \end{cases} \end{eqnarray}\]is probability distribution over $(\mathbb{R}^{m_{1}}, \mathcal{B}(\mathbb{R}^{m_{1}}))$ and regular probability distribution of $P_{a_{1:m}}$ of $T_{1:m_{1}}$ given $T_{m_{1}+1:m} = t_{m_{1}+1:m}$.