$\require{AMScd}$
2.1 Definitions
p.d.f. of Bernoulli distribution
\[\theta \in [0, 1], \ x \in \{0, 1\}, \ \mathrm{Ber}(\theta)(x) := \theta^{x}(1 - \theta)^{1 - x} .\]p.d.f. of Normal distribution
\[\mu \in \mathbb{R}, \ \sigma \in (0, \infty), \ x \in \mathbb{R}, \ \mathrm{N}(\mu, \sigma^{2})(x) := \frac{1}{\sqrt{2 \pi \sigma^{2}}} \exp \left( - \frac{1}{2 \sigma^{2}} (x - \mu)^{2} \right) .\]p.d.f. of Exponential distribution
\[\theta > 0, \ x \in [0, \infty) \ \mathrm{Exp}(\theta)(x) := \theta e^{-\theta x}\]2.1.1 Notional overviw
\[\begin{CD} (S, \mathcal{A}, \mu) @>{X}>> (\mathcal{X}, \mathcal{B}, P_{\theta}) @>{T}>> (\mathcal{T}, \mathcal{C}, P_{\theta, T}) \\ @V{\Theta}VV \\ (\Omega, \tau, \mu_{\Theta}) \end{CD}\]- $(S, \mathcal{A}, \mu)$,
- probability sp.
- $\mathrm{Pr}(A) := \mu(A)$,
- $\mathrm{Pr}(A \mid \cdot) := \mu(A \mid \cdot)$,
- $X: S \rightarrow \mathcal{X}$,
- random quantity
- data
- $(\mathcal{X}, \mathcal{B})$,
- measurable sp.
- sample sp.
- usually subset of euclidian space
- \(\{x\} \in \mathcal{B} \ (x \in \mathcal{X})\),
- elments of $\mathcal{X}$ is denoted by $x, y, z, x_{1}, x_{2}, \ldots$
- $\mathcal{A}_{X} := X^{-1}(\mathcal{B})$,
- $(\Omega, \tau)$,
- measurable sp.
- $\Omega$ is called parameter sp.
- $\Omega$ is usually subset of finite-dimensional euclidean space
- $\Theta: S \rightarrow \Omega$,
- \(\mathcal{P}_{0} := \{P_{\theta}\}_{\Omega}\),
- parametric family of distributions for $X$,
- $P_{\theta}: \mathcal{B} \rightarrow [0, 1] \ (\theta \in \Theta)$
- conditonal distribution of $X$ given $\Theta = \theta$,
$P_{\theta}(B)$ is defined as
\[\begin{eqnarray} B \in \mathcal{B}, \ P_{\theta}(B) & := & P(X^{-1}(B) \mid \Theta = \theta) \nonumber \\ & = & \mu(X^{-1}(B) \mid \Theta = \theta) \nonumber . \end{eqnarray}\]Then $P_{\theta}$ satisfies
\[\begin{eqnarray} \forall B \in \mathcal{B}, \ \forall D \in \tau, \ \int_{D} P_{\theta}(B) \ \mu_{\Theta}(d\theta) & = & \int_{D} P(X^{-1}(B) \mid \Theta = \theta) \ \mu_{\Theta}(d\theta) \nonumber \\ & = & \int_{\Theta^{-1}(D)} 1_{X^{-1}(B)}(s) \ \mu(d s) \label{chap02_01_p_theta} . \end{eqnarray}\]- $P_{\theta}^{\prime}: \mathcal{A}_{X} \rightarrow [0, 1] \ (\theta \in \Theta)$
- probability measure on $(S, \mathcal{A}_{X})$ given $\Theta = \theta$,
Let \(\mu_{\mid \mathcal{A}_{X}}\) be measure which restricts $\mu$ by \(\mathcal{A}_{X}\). Then $P_{\theta}^{\prime}$ is defined as
\[\begin{eqnarray} A \in \mathcal{A}_{X}, \ P_{\theta}^{\prime}(A) & := & \mu_{\mid \mathcal{A}_{X}}(A \mid \Theta = \theta) \nonumber \\ & = & \mathrm{Pr}_{\mid \mathcal{A}_{X}}(A \mid \Theta = \theta) \nonumber \end{eqnarray} .\]$P_{\theta}^{\prime}$ stasifies that
\[\begin{eqnarray} \forall A \in \mathcal{A}_{X}, \ \forall D \in \tau, \ \int_{D} P_{\theta}^{\prime}(A) \ \mu_{\Theta}(d \theta) & = & \int_{D} \mu_{\mid \mathcal{A}_{X}}(A \mid \Theta = \theta) \ \mu_{\Theta}(d \theta) \nonumber \\ & = & \int_{\Theta^{-1}(D)} 1_{A}(s) \ \mu(d s) \label{chap02_01_p_theta_prime} . \end{eqnarray}\]For all $A \in \mathcal{A}_{X}$, $\exists B \in \mathcal{B}$ such that $A = X^{-1}(B)$. Thus, from \(\eqref{chap02_01_p_theta}\) and \(\eqref{chap02_01_p_theta_prime}\),
\[\begin{eqnarray} \forall A \in \mathcal{B}, \ \exists B \in \mathcal{B} \text{ s.t. } \forall D \in \tau, \ \int_{D} P_{\theta}(B) \ \mu_{\Theta}(d\theta) = \int_{D} P_{\theta}^{\prime}(A) \ \mu_{\Theta}(d\theta) . \end{eqnarray}\]Hence
\[\begin{eqnarray} A \in \mathcal{A}_{X}, \ \exists B \in \mathcal{B} \text{ s.t. } P_{\theta}^{\prime}(A) & = & P_{\theta}^{\prime}(X \in B) \nonumber \\ & = & \mathrm{Pr}_{\mid \mathcal{A}_{X}}(X \in B \mid \Theta = \theta) \nonumber \\ & = & \mu_{\mid \mathcal{A}_{X}}(X \in B \mid \Theta = \theta) \nonumber \\ & = & \mu_{\mid \mathcal{A}_{X}}(X^{-1}(B) \mid \Theta = \theta) \nonumber \\ & = & P_{\theta}(B) \quad \mu_{\Theta} \text{-a.s.}\theta . \end{eqnarray}\]Example 2.1
- \(\{X_{n}\}_{n \in \mathbb{N}}\),
- conditonally IID random variable with $\mathrm{N}(\theta, 1)$ given $\Theta = \theta$,
- $X = (X_{1}, \ldots, X_{n})$,
If $n=1$, $B \in \mathcal{B}$ then
\[\begin{eqnarray} P_{\theta}(B) & = & \int_{B} \frac{ 1 }{ \sqrt{2 \pi} } \exp \left( - \frac{ (x - \theta)^{2} }{ 2 } \right) \ dx \nonumber \\ & = & P_{\theta}^{\prime}(X_{i} \in B) \nonumber \\ & = & \mathrm{Pr}(X_{i} \in B \mid \Theta = \theta) \nonumber \end{eqnarray}\]Similarly, $B \in \mathcal{B}^{n}$,
\[\begin{eqnarray} P_{\theta}(B) & = & \int_{B} (2 \pi)^{-n/2} \exp \left( - \frac{ 1 }{ 2 } \sum_{i=1}^{n} (x_{i} - \theta)^{2} \right) \ dx_{1} \cdots dx_{n} \nonumber \\ & = & P_{\theta}^{\prime}(X \in B) \nonumber \\ & = & \mathrm{Pr}(X \in B \mid \Theta = \theta) . \end{eqnarray}\]- $\mu_{\Theta}$
- distribution of $\Theta$,
- $D \in \tau$
- $B \in \mathcal{B}$,
- $A := X^{-1}(B)$,
- $A_{D} := \Theta^{-1}(D)$,
We have
\[\begin{eqnarray} \mathrm{Pr}(X \in B, \Theta \in D) & = & \mu(A \cap A_{D}) \nonumber \\ & = & \int_{A_{D}} \mathrm{Pr}(A \mid \Theta)(s) \ \mu(ds) \nonumber \\ & = & \int_{D} P_{\theta}(B) \ \mu_{\Theta}(d \theta) \nonumber \end{eqnarray}\]Example 2.2
Continuation of Example 2.1.
- $\Theta \sim \mathrm{N}(0, 1)$,
- $D \in \tau$,
- $B \in \mathcal{B}^{n}$,
Then joint distribution of $\Theta$ and $X$ is
\[\begin{eqnarray} \mathrm{Pr}(\Theta \in D, X \in B) & = & \mu(\Theta \in D, X \in B) \nonumber \\ & = & \int_{D} \mu(X \in B \mid \Theta = \theta) \ \mu_{\Theta}(d\theta) \nonumber \\ & = & \int_{D} \left( \int_{B} \mathrm{N}(\theta, 1)(x_{1}) \cdots \mathrm{N}(\theta, 1)(x_{n}) \ dx_{1} \cdots dx_{n} \right) \ \mu_{\Theta}(d\theta) \nonumber \\ & = & \int_{D} \left( \int_{B} \mathrm{N}(\theta, 1)(x_{1}) \cdots \mathrm{N}(\theta, 1)(x_{n}) \ dx_{1} \cdots dx_{n} \right) \mathrm{N}(0, 1)(\theta) \ d \theta \nonumber \\ & = & \int_{D} \int_{B} (2\pi)^{-\frac{n+1}{2}} \exp \left( - \frac{1}{2} \left( \sum_{j=1}^{n} (x_{j} - \theta)^{2} + \theta^{2} \right) \right) \ dx_{1} \cdots dx_{n} \ d \theta \nonumber \end{eqnarray} .\]2.1.2 Sufficiency
Definition 2.3
- $(\mathcal{T}, \mathcal{C})$,
- measurable sp.
- \(\forall x \in \mathcal{T}, \{x\} \in \mathcal{C}\),
- $T: \mathcal{X} \rightarrow \mathcal{T}$,
$T$ is said to be statistc if $T$ is measurable.
The only requirement is that $\mathcal{C}$ must contain singletons. We denote composite function $T \circ X$ by
\[T(X): S \rightarrow \mathcal{T} .\]$P_{\theta, T}:\mathcal{C} \rightarrow [0, 1] $ is defined as
\[\begin{eqnarray} C \in \mathcal{C}, \ P_{\theta, T}(C) & := & P_{\theta}(T^{-1}(C)) \nonumber \\ & = & P_{\theta}^{\prime}(T(X) \in C) \nonumber \\ & = & P_{\theta}^{\prime}((T \circ X)^{-1}(C)) \nonumber \\ & = & \mu((T \circ X)^{-1}(C) \mid \Theta = \theta) \quad \mu_{\Theta} \text{-a.s.}\theta \nonumber \end{eqnarray} .\]Remark
- $\mu_{\Theta \mid X}: \tau \times \mathcal{X} \rightarrow [0, 1]$,
- conditional probability given $X = x$,
Then we have
\[\forall D \in \tau, \ \mu_{\mid \sigma(\Theta)}(\Theta^{-1}(D) \mid X = x) = \mu_{\Theta \mid X}(D \mid x) \quad \mu_{X} \text{-a.s.}x .\]- $\mu_{\Theta \mid T}: \tau \times \mathcal{T} \rightarrow [0, 1]$,
- conditional probability given $T = t$,
Then we have
\[\forall D \in \tau, \ \mu_{\mid \sigma(\Theta)}(\Theta^{-1}(D) \mid T \circ X = t) = \mu_{\Theta \mid X}(D \mid t) \quad \mu_{T} \text{-a.s.}t\]where $\mu_{T}(\cdot) := \mu_{X}(T^{-1}(\cdot))$.
Definition 2.4
- $\mathcal{P}_{0}$,
- parametric family of distributions on $(\mathcal{X}, \mathcal{B})$,
- $(\Omega, \tau)$,
- $\Theta: S \rightarrow \Omega$,
- $T: \mathcal{X} \rightarrow \mathcal{T}$,
- statistic
- $\mu_{X}:\mathcal{B} \rightarrow [0, 1]$
- distribution of $X$
- $\mu_{\Theta}:\tau \rightarrow [0, 1]$
- distribution of $\Theta$
$T$ is a sufficient statistic for $\Theta$ ( in the Bayesian sense) if
\[\begin{eqnarray} D \in \tau, \ \mu_{\Theta \mid X}(D \mid x) & = & \mu_{\Theta \mid T}(D \mid T(x)) \quad \mu_{X} \text{-a.s.} x . \nonumber \end{eqnarray}\]That is,
\[\begin{eqnarray} D \in \tau, \ \mu_{\mid \sigma(\Theta)}(\Theta^{-1}(D) \mid X = x) & = & \mu_{\mid \sigma(\Theta)}(\Theta^{-1}(D) \mid T \circ X = T(x)) \quad \mu_{X} \text{-a.s.} x \label{chap02_01_def_02_04_equivalent_condition} \end{eqnarray}\]Example 2.5
- \(\{X_{n}\}_{n \in \mathbb{N}}\),
- exchangeable Bernoulli random variables
- \(X_{i} \in \{0, 1\}\),
- 1 occurs probability $\theta$
- \(\mathcal{P}_{0} := \{P_{\theta}\}_{\Omega}\),
- IID distributions of bernoulli distribution
- $X = (X_{1}, \ldots, X_{n})$
- $P_{\theta}$,
- distribution that says the coordinates of $X$ are IID $\mathrm{Ber}(\theta)$.
Let $\nu$ be countable measure. Radon-Nikodym derivative of posterior distribution $\mu_{\Theta \mid X}$ is given by
\[\begin{eqnarray} \frac{ d \mu_{\Theta \mid X} }{ d \mu_{\Theta} } (\theta \mid x) & = & \frac{ \frac{ d \mu_{\Theta \mid X} }{ d \nu } }{ \frac{ d \mu_{\Theta} }{ d \nu } } (\theta \mid x) \nonumber \\ & = & \frac{ \theta^{\sum_{i=1}^{n}x_{i}} (1 - \theta)^{n - \sum_{i=1}^{n}x_{i}} }{ \int_{} \psi^{\sum_{i=1}^{n} x_{i}} (1 - \psi)^{n - \sum_{i=1}^{n} x_{i}} \ d \mu_{\Theta}(\psi) } \end{eqnarray}\]Next, treat $T(X) := \sum_{i=1}^{n} X_{i}$ as the data. The dnesity of $T$ given $\Theta = \theta$ is
\[f_{T \mid \Theta}(t \mid \theta) = \left( \begin{array}{c} n \\ t \end{array} \right) \theta^{t}(1 - \theta)^{n-t} \ t = 0, \ldots, n .\]It follows from Bayes’ theorem 1.31 that the posterior given $T = t = \sum_{i=1}^{n} x_{i}$ has derivative
\[\frac{ d \mu_{\Theta \mid T} }{ d \mu_{\Theta} } = \frac{ \left( \begin{array}{c} n \\ t \end{array} \right) \theta^{t} (1 - \theta)^{n - t}i }{ \int_{} \left( \begin{array}{c} n \\ t \end{array} \right) \psi^{t} (1 - \psi)^{n - t} \ \mu_{\Theta}(d \psi) } .\]Hence $T$ is sufficient.
$T$ is sufficinet if distribution of $\Theta$ given $X = x$ is a function of $T(x)$ by following lemma.
Lemma 2.6
- $T: \mathcal{X} \rightarrow \mathcal{T}$
- statistic
- $\mathcal{B}_{T} := \sigma(T)$,
Then
- (i) $T$ is sufficient in the Bayesian sense
- (ii) there exists a version of the posterior distribution $\mu_{\Theta \mid X}$ given $X$ such that $\forall D \in \tau$, $\mu_{\Theta \mid X}(D \mid \cdot)$ is measurable with respect to $\mathcal{B}_{T}$.
proof
By applying Theorem B.73, taking
- $\mathcal{B} = X^{-1}(\mathcal{B})$,
- $\mathcal{C} = X^{-1}(\mathcal{B}_{T})$,
- \(Z = 1_{D}(\Theta) \ (D \in \tau)\),
condition (i) can be interepreted as there exists a version of
\[\mathrm{E} \left[ 1_{D}(\Theta) \mid X^{-1}(\mathcal{B}) \right] = \mu(\Theta^{-1}(D) \mid X^{-1}(\mathcal{B}))\]such that $X^{-1}(\mathcal{B}_{T})$ measurable.
Condition (ii) can be interpreted as
\[\begin{eqnarray} & & \mathrm{E} \left[ Z \mid \mathcal{B} \right] = \mathrm{E} \left[ Z \mid \mathcal{C} \right] \nonumber \\ & \Leftrightarrow & \mathrm{E} \left[ 1_{D}(\Theta) \mid X^{-1}(\mathcal{B}) \right] = \mathrm{E} \left[ 1_{D}(\Theta)) \mid X^{-1}(\mathcal{B}_{T}) \right] \nonumber \\ & \Leftrightarrow & \mu(\Theta^{-1}(D) \mid X^{-1}(\mathcal{B})) = \mu(\Theta^{-1}(D) \mid X^{-1}(\mathcal{B}_{T})) . \label{chap02_01_lemma_02_06_condition} \end{eqnarray}\]It is easy to show that condition (i) in B.73 is equivalent to condition (ii) in lemma 2.6. Indeed, the following equation holds
\[\mu(\Theta^{-1}(D) \mid X^{-1}(\mathcal{B}))(s) = \mu(\Theta^{-1}(D) \mid X = X(s)) .\]Thus, the conditions are equivalent.
We only need to show that condition (ii) in lemma B.73 and condition (i) in lemma 2.6 are equivalent.
\[\begin{eqnarray} D \in \tau, \ C \in \sigma(T), \ \int_{T^{-1}(C)} \mu(\Theta^{-1}(D) \mid T \circ X = T(x)) \ \mu_{X}(d x) & = & \int_{C} \mu(\Theta^{-1}(D) \mid T \circ X = t) \ \mu_{X \circ T}(d t) \nonumber \\ \int_{T^{-1}(C)} \mu_{\mid \sigma(\Theta)}(\Theta^{-1}(D) \mid X^{-1}(\mathcal{B}_{T}))(ds) \ \mu_{X}(d s) & = & \int_{C} \mu(\Theta^{-1}(D) \mid T \circ X = t) \ \mu_{X \circ T}(d t) \nonumber \\ \int_{C} \mu(\Theta^{-1}(D) \mid T \circ X = t) \ \mu_{X \circ T}(d t) & = & \int_{X^{-1}(T^{-1}(C))} 1_{\Theta^{-1}}(ds) \ \mu(d s) & = & \nonumber \\ & = & \int_{T^{-1}(C)} \mu_{\mid \sigma(\Theta)}(\Theta^{-1}(D) \mid X = x) \ \mu(d x) \nonumber \\ \int_{X^{-1}(T^{-1}(C))} 1_{\Theta^{-1}}(ds) \ \mu(d s) & = & \int_{X^{-1}(T^{-1}(C))} \mu_{\mid \sigma(\Theta)}(\Theta^{-1}(D) \mid X^{-1}(\mathcal{B}))(ds) \ \mu(d s) \nonumber \end{eqnarray}\]Then we have.
\[\begin{eqnarray} \mu_{\mid \sigma(\Theta)}(\Theta^{-1}(D) \mid X = x) & = & \mu_{\mid \sigma(\Theta)}(\Theta^{-1}(D) \mid T \circ X = T(x)) \end{eqnarray}\]Example 2.7
- \(\mathcal{P}_{0} := \{P_{\theta}\}_{\theta \in \Omega}\),
- IID exponential distributions
- $P_{\theta} := \mathrm{Exp}(\theta)$,
- \(\{X_{i}\}_{i \in \mathbb{N}}\),
- $X = (X_{1}, \ldots, n)$,
Let $T(x) := \sum_{i=1}^{n} x_{i}$. By lemma 2.6, $T$ is sufficinet for $\Theta$.
Definition 2.8
- $(\Omega, \tau)$,
- paramter sp
- \(\mathcal{P}_{0} := \{P_{\theta}\}_{\theta \in \Theta}\),
- parametric family of distributions on $(\mathcal{X}, \mathcal{B})$,
- $\Theta: \mathcal{P}_{0} \rightarrow \Omega$,
- $T: \mathcal{X} \rightarrow \mathcal{T}$,
- statistic
Suppose that there eixst versions of $P_{\theta}(\cdot \mid T)$ and a function $r: \mathcal{B} \times \mathcal{T} \rightarrow [0, 1]$ such that
\[r(\cdot, t): \mathcal{B} \rightarrow [0, 1]\]is a probability for $t \in \mathcal{T}$. $r(A, \cdot)$ is measurable for $A \in \mathcal{B}$.
\[\forall \theta \in \Theta, \ B \in \mathcal{B}, \ P_{\theta}(B \mid T = t) = r(B, t) \ P_{\theta, T} \text{-a.e.}\]Then
$T$ is a sufficient static for $\Theta$ in the classical sense.
Example 2.9
Continuation of Example 2.5
- $X_{i}$,
- IID $\mathrm{Ber}(\theta)$ given $\Theta = \theta$,
- i.e. conditionally IID given $\Theta = \theta$ and conditonal distribution of $X_{i}$ given $\Theta = \theta$ is $\mathrm{Ber}(\theta)$
- $X := (X_{1}, \ldots, X_{n})$,
- $T(x) := \sum_{j=1}^{n}x_{i}$,
Thus, distribution of $r(\cdot, t)$ is discrete uniform distribution on \(\{0, \ldots, \left( \begin{array}{c} n \\ t \end{array} \right)\}\). That is,
\[r(B, t) := \mathrm{card}(B) \frac{ 1 }{ \left( \begin{array}{c} n \\ t \end{array} \right) } .\]Example 2.10
Continuation of Exmaple 2.7
- \(\mathcal{P}_{0} := \{P_{\theta}\}_{\theta \in \Omega}\),
- IID exponential distributions given $\Theta = \theta$,
- $P_{\theta} := \mathrm{Exp}(\theta)$,
- \(\{X_{i}\}_{i \in \mathbb{N}}\),
- $X = (X_{1}, \ldots, n)$,
By Corollary B.55,
\[\begin{eqnarray} \frac{ f_{X \mid \Theta}(x_{1}, \ldots, x_{n} \mid \theta) }{ f_{T \mid \Theta}(t \mid \theta) } & := & \frac{ \theta^{n} \exp \left( - \theta t \right) }{ \frac{ 1 }{ (n - 1)! } \theta^{n}t^{n - 1} \exp \left( - t \theta \right) } \nonumber \\ & = & \frac{ (n - 1)! }{ t^{n - 1} } . \nonumber \end{eqnarray}\]