memo

Definitions

$\require{AMScd}$

2.1 Definitions

p.d.f. of Bernoulli distribution

\[\theta \in [0, 1], \ x \in \{0, 1\}, \ \mathrm{Ber}(\theta)(x) := \theta^{x}(1 - \theta)^{1 - x} .\]

p.d.f. of Normal distribution

\[\mu \in \mathbb{R}, \ \sigma \in (0, \infty), \ x \in \mathbb{R}, \ \mathrm{N}(\mu, \sigma^{2})(x) := \frac{1}{\sqrt{2 \pi \sigma^{2}}} \exp \left( - \frac{1}{2 \sigma^{2}} (x - \mu)^{2} \right) .\]

p.d.f. of Exponential distribution

\[\theta > 0, \ x \in [0, \infty) \ \mathrm{Exp}(\theta)(x) := \theta e^{-\theta x}\]

2.1.1 Notional overviw

\[\begin{CD} (S, \mathcal{A}, \mu) @>{X}>> (\mathcal{X}, \mathcal{B}, P_{\theta}) @>{T}>> (\mathcal{T}, \mathcal{C}, P_{\theta, T}) \\ @V{\Theta}VV \\ (\Omega, \tau, \mu_{\Theta}) \end{CD}\]

$(S, \mathcal{A}, \mu)$,
- probability sp.
$\mathrm{Pr}(A) := \mu(A)$,
$\mathrm{Pr}(A \mid \cdot) := \mu(A \mid \cdot)$,
$X: S \rightarrow \mathcal{X}$,
- random quantity
- data
$(\mathcal{X}, \mathcal{B})$,
- measurable sp.
- sample sp.
- usually subset of euclidian space
- $\{x\} \in \mathcal{B} \ (x \in \mathcal{X})$,
- elments of $\mathcal{X}$ is denoted by $x, y, z, x_{1}, x_{2}, \ldots$
$\mathcal{A}_{X} := X^{-1}(\mathcal{B})$,
$(\Omega, \tau)$,
- measurable sp.
- $\Omega$ is called parameter sp.
- $\Omega$ is usually subset of finite-dimensional euclidean space
$\Theta: S \rightarrow \Omega$,
$\mathcal{P}_{0} := \{P_{\theta}\}_{\Omega}$,
- parametric family of distributions for $X$,
$P_{\theta}: \mathcal{B} \rightarrow [0, 1] \ (\theta \in \Theta)$
- conditonal distribution of $X$ given $\Theta = \theta$,

$P_{\theta}(B)$ is defined as

\[\begin{eqnarray} B \in \mathcal{B}, \ P_{\theta}(B) & := & P(X^{-1}(B) \mid \Theta = \theta) \nonumber \\ & = & \mu(X^{-1}(B) \mid \Theta = \theta) \nonumber . \end{eqnarray}\]

Then $P_{\theta}$ satisfies

\[\begin{eqnarray} \forall B \in \mathcal{B}, \ \forall D \in \tau, \ \int_{D} P_{\theta}(B) \ \mu_{\Theta}(d\theta) & = & \int_{D} P(X^{-1}(B) \mid \Theta = \theta) \ \mu_{\Theta}(d\theta) \nonumber \\ & = & \int_{\Theta^{-1}(D)} 1_{X^{-1}(B)}(s) \ \mu(d s) \label{chap02_01_p_theta} . \end{eqnarray}\]

$P_{\theta}^{\prime}: \mathcal{A}_{X} \rightarrow [0, 1] \ (\theta \in \Theta)$
- probability measure on $(S, \mathcal{A}_{X})$ given $\Theta = \theta$,

Let $\mu_{\mid \mathcal{A}_{X}}$ be measure which restricts $\mu$ by $\mathcal{A}_{X}$. Then $P_{\theta}^{\prime}$ is defined as

\[\begin{eqnarray} A \in \mathcal{A}_{X}, \ P_{\theta}^{\prime}(A) & := & \mu_{\mid \mathcal{A}_{X}}(A \mid \Theta = \theta) \nonumber \\ & = & \mathrm{Pr}_{\mid \mathcal{A}_{X}}(A \mid \Theta = \theta) \nonumber \end{eqnarray} .\]

$P_{\theta}^{\prime}$ stasifies that

\[\begin{eqnarray} \forall A \in \mathcal{A}_{X}, \ \forall D \in \tau, \ \int_{D} P_{\theta}^{\prime}(A) \ \mu_{\Theta}(d \theta) & = & \int_{D} \mu_{\mid \mathcal{A}_{X}}(A \mid \Theta = \theta) \ \mu_{\Theta}(d \theta) \nonumber \\ & = & \int_{\Theta^{-1}(D)} 1_{A}(s) \ \mu(d s) \label{chap02_01_p_theta_prime} . \end{eqnarray}\]

For all $A \in \mathcal{A}_{X}$, $\exists B \in \mathcal{B}$ such that $A = X^{-1}(B)$. Thus, from $\eqref{chap02_01_p_theta}$ and $\eqref{chap02_01_p_theta_prime}$,

\[\begin{eqnarray} \forall A \in \mathcal{B}, \ \exists B \in \mathcal{B} \text{ s.t. } \forall D \in \tau, \ \int_{D} P_{\theta}(B) \ \mu_{\Theta}(d\theta) = \int_{D} P_{\theta}^{\prime}(A) \ \mu_{\Theta}(d\theta) . \end{eqnarray}\]

Hence

\[\begin{eqnarray} A \in \mathcal{A}_{X}, \ \exists B \in \mathcal{B} \text{ s.t. } P_{\theta}^{\prime}(A) & = & P_{\theta}^{\prime}(X \in B) \nonumber \\ & = & \mathrm{Pr}_{\mid \mathcal{A}_{X}}(X \in B \mid \Theta = \theta) \nonumber \\ & = & \mu_{\mid \mathcal{A}_{X}}(X \in B \mid \Theta = \theta) \nonumber \\ & = & \mu_{\mid \mathcal{A}_{X}}(X^{-1}(B) \mid \Theta = \theta) \nonumber \\ & = & P_{\theta}(B) \quad \mu_{\Theta} \text{-a.s.}\theta . \end{eqnarray}\]

Example 2.1

$\{X_{n}\}_{n \in \mathbb{N}}$,
- conditonally IID random variable with $\mathrm{N}(\theta, 1)$ given $\Theta = \theta$,
$X = (X_{1}, \ldots, X_{n})$,

If $n=1$, $B \in \mathcal{B}$ then

\[\begin{eqnarray} P_{\theta}(B) & = & \int_{B} \frac{ 1 }{ \sqrt{2 \pi} } \exp \left( - \frac{ (x - \theta)^{2} }{ 2 } \right) \ dx \nonumber \\ & = & P_{\theta}^{\prime}(X_{i} \in B) \nonumber \\ & = & \mathrm{Pr}(X_{i} \in B \mid \Theta = \theta) \nonumber \end{eqnarray}\]

Similarly, $B \in \mathcal{B}^{n}$,

\[\begin{eqnarray} P_{\theta}(B) & = & \int_{B} (2 \pi)^{-n/2} \exp \left( - \frac{ 1 }{ 2 } \sum_{i=1}^{n} (x_{i} - \theta)^{2} \right) \ dx_{1} \cdots dx_{n} \nonumber \\ & = & P_{\theta}^{\prime}(X \in B) \nonumber \\ & = & \mathrm{Pr}(X \in B \mid \Theta = \theta) . \end{eqnarray}\]

$\mu_{\Theta}$
- distribution of $\Theta$,
$D \in \tau$
$B \in \mathcal{B}$,
$A := X^{-1}(B)$,
$A_{D} := \Theta^{-1}(D)$,

We have

\[\begin{eqnarray} \mathrm{Pr}(X \in B, \Theta \in D) & = & \mu(A \cap A_{D}) \nonumber \\ & = & \int_{A_{D}} \mathrm{Pr}(A \mid \Theta)(s) \ \mu(ds) \nonumber \\ & = & \int_{D} P_{\theta}(B) \ \mu_{\Theta}(d \theta) \nonumber \end{eqnarray}\]

■

Example 2.2

Continuation of Example 2.1.

$\Theta \sim \mathrm{N}(0, 1)$,
$D \in \tau$,
$B \in \mathcal{B}^{n}$,

Then joint distribution of $\Theta$ and $X$ is

\[\begin{eqnarray} \mathrm{Pr}(\Theta \in D, X \in B) & = & \mu(\Theta \in D, X \in B) \nonumber \\ & = & \int_{D} \mu(X \in B \mid \Theta = \theta) \ \mu_{\Theta}(d\theta) \nonumber \\ & = & \int_{D} \left( \int_{B} \mathrm{N}(\theta, 1)(x_{1}) \cdots \mathrm{N}(\theta, 1)(x_{n}) \ dx_{1} \cdots dx_{n} \right) \ \mu_{\Theta}(d\theta) \nonumber \\ & = & \int_{D} \left( \int_{B} \mathrm{N}(\theta, 1)(x_{1}) \cdots \mathrm{N}(\theta, 1)(x_{n}) \ dx_{1} \cdots dx_{n} \right) \mathrm{N}(0, 1)(\theta) \ d \theta \nonumber \\ & = & \int_{D} \int_{B} (2\pi)^{-\frac{n+1}{2}} \exp \left( - \frac{1}{2} \left( \sum_{j=1}^{n} (x_{j} - \theta)^{2} + \theta^{2} \right) \right) \ dx_{1} \cdots dx_{n} \ d \theta \nonumber \end{eqnarray} .\]

■

2.1.2 Sufficiency

Definition 2.3

$(\mathcal{T}, \mathcal{C})$,
- measurable sp.
- $\forall x \in \mathcal{T}, \{x\} \in \mathcal{C}$,
$T: \mathcal{X} \rightarrow \mathcal{T}$,

$T$ is said to be statistc if $T$ is measurable.

■

The only requirement is that $\mathcal{C}$ must contain singletons. We denote composite function $T \circ X$ by

\[T(X): S \rightarrow \mathcal{T} .\]

$P_{\theta, T}:\mathcal{C} \rightarrow [0, 1] $ is defined as

\[\begin{eqnarray} C \in \mathcal{C}, \ P_{\theta, T}(C) & := & P_{\theta}(T^{-1}(C)) \nonumber \\ & = & P_{\theta}^{\prime}(T(X) \in C) \nonumber \\ & = & P_{\theta}^{\prime}((T \circ X)^{-1}(C)) \nonumber \\ & = & \mu((T \circ X)^{-1}(C) \mid \Theta = \theta) \quad \mu_{\Theta} \text{-a.s.}\theta \nonumber \end{eqnarray} .\]

Remark

$\mu_{\Theta \mid X}: \tau \times \mathcal{X} \rightarrow [0, 1]$,
- conditional probability given $X = x$,

\[x \in \mathcal{X}, \ D \in \tau, \ \mu_{\Theta \mid X}(D \mid x) := \mu(\Theta^{-1}(D) \mid X = x) .\]

Then we have

\[\forall D \in \tau, \ \mu_{\mid \sigma(\Theta)}(\Theta^{-1}(D) \mid X = x) = \mu_{\Theta \mid X}(D \mid x) \quad \mu_{X} \text{-a.s.}x .\]

$\mu_{\Theta \mid T}: \tau \times \mathcal{T} \rightarrow [0, 1]$,
- conditional probability given $T = t$,

\[t \in \mathcal{T}, \ D \in \tau, \ \mu_{\Theta \mid T}(D \mid t) := \mu(\Theta^{-1}(D) \mid T \circ X = t) .\]

Then we have

\[\forall D \in \tau, \ \mu_{\mid \sigma(\Theta)}(\Theta^{-1}(D) \mid T \circ X = t) = \mu_{\Theta \mid X}(D \mid t) \quad \mu_{T} \text{-a.s.}t\]

where $\mu_{T}(\cdot) := \mu_{X}(T^{-1}(\cdot))$.

■

Definition 2.4

$\mathcal{P}_{0}$,
- parametric family of distributions on $(\mathcal{X}, \mathcal{B})$,
$(\Omega, \tau)$,
$\Theta: S \rightarrow \Omega$,
$T: \mathcal{X} \rightarrow \mathcal{T}$,
- statistic
$\mu_{X}:\mathcal{B} \rightarrow [0, 1]$
- distribution of $X$
$\mu_{\Theta}:\tau \rightarrow [0, 1]$
- distribution of $\Theta$

$T$ is a sufficient statistic for $\Theta$ ( in the Bayesian sense) if

\[\begin{eqnarray} D \in \tau, \ \mu_{\Theta \mid X}(D \mid x) & = & \mu_{\Theta \mid T}(D \mid T(x)) \quad \mu_{X} \text{-a.s.} x . \nonumber \end{eqnarray}\]

That is,

\[\begin{eqnarray} D \in \tau, \ \mu_{\mid \sigma(\Theta)}(\Theta^{-1}(D) \mid X = x) & = & \mu_{\mid \sigma(\Theta)}(\Theta^{-1}(D) \mid T \circ X = T(x)) \quad \mu_{X} \text{-a.s.} x \label{chap02_01_def_02_04_equivalent_condition} \end{eqnarray}\]

■

Example 2.5

$\{X_{n}\}_{n \in \mathbb{N}}$,
- exchangeable Bernoulli random variables
- $X_{i} \in \{0, 1\}$,
- 1 occurs probability $\theta$
$\mathcal{P}_{0} := \{P_{\theta}\}_{\Omega}$,
- IID distributions of bernoulli distribution
$X = (X_{1}, \ldots, X_{n})$
$P_{\theta}$,
- distribution that says the coordinates of $X$ are IID $\mathrm{Ber}(\theta)$.

\[\begin{eqnarray} \mu_{X \mid \Theta}(B) & = & \int_{B} \theta^{ \sum_{i=1}^{n} x_{i} } (1 - \theta)^{ n - \sum_{i=1}^{n} x_{i} } \ \mu(dx) \nonumber \end{eqnarray}\]

Let $\nu$ be countable measure. Radon-Nikodym derivative of posterior distribution $\mu_{\Theta \mid X}$ is given by

\[\begin{eqnarray} \frac{ d \mu_{\Theta \mid X} }{ d \mu_{\Theta} } (\theta \mid x) & = & \frac{ \frac{ d \mu_{\Theta \mid X} }{ d \nu } }{ \frac{ d \mu_{\Theta} }{ d \nu } } (\theta \mid x) \nonumber \\ & = & \frac{ \theta^{\sum_{i=1}^{n}x_{i}} (1 - \theta)^{n - \sum_{i=1}^{n}x_{i}} }{ \int_{} \psi^{\sum_{i=1}^{n} x_{i}} (1 - \psi)^{n - \sum_{i=1}^{n} x_{i}} \ d \mu_{\Theta}(\psi) } \end{eqnarray}\]

Next, treat $T(X) := \sum_{i=1}^{n} X_{i}$ as the data. The dnesity of $T$ given $\Theta = \theta$ is

\[f_{T \mid \Theta}(t \mid \theta) = \left( \begin{array}{c} n \\ t \end{array} \right) \theta^{t}(1 - \theta)^{n-t} \ t = 0, \ldots, n .\]

It follows from Bayes’ theorem 1.31 that the posterior given $T = t = \sum_{i=1}^{n} x_{i}$ has derivative

\[\frac{ d \mu_{\Theta \mid T} }{ d \mu_{\Theta} } = \frac{ \left( \begin{array}{c} n \\ t \end{array} \right) \theta^{t} (1 - \theta)^{n - t}i }{ \int_{} \left( \begin{array}{c} n \\ t \end{array} \right) \psi^{t} (1 - \psi)^{n - t} \ \mu_{\Theta}(d \psi) } .\]

Hence $T$ is sufficient.

■

$T$ is sufficinet if distribution of $\Theta$ given $X = x$ is a function of $T(x)$ by following lemma.

Lemma 2.6

$T: \mathcal{X} \rightarrow \mathcal{T}$
- statistic
$\mathcal{B}_{T} := \sigma(T)$,

Then

(i) $T$ is sufficient in the Bayesian sense
(ii) there exists a version of the posterior distribution $\mu_{\Theta \mid X}$ given $X$ such that $\forall D \in \tau$, $\mu_{\Theta \mid X}(D \mid \cdot)$ is measurable with respect to $\mathcal{B}_{T}$.

proof

By applying Theorem B.73, taking

$\mathcal{B} = X^{-1}(\mathcal{B})$,
$\mathcal{C} = X^{-1}(\mathcal{B}_{T})$,
$Z = 1_{D}(\Theta) \ (D \in \tau)$,

condition (i) can be interepreted as there exists a version of

\[\mathrm{E} \left[ 1_{D}(\Theta) \mid X^{-1}(\mathcal{B}) \right] = \mu(\Theta^{-1}(D) \mid X^{-1}(\mathcal{B}))\]

such that $X^{-1}(\mathcal{B}_{T})$ measurable.

Condition (ii) can be interpreted as

\[\begin{eqnarray} & & \mathrm{E} \left[ Z \mid \mathcal{B} \right] = \mathrm{E} \left[ Z \mid \mathcal{C} \right] \nonumber \\ & \Leftrightarrow & \mathrm{E} \left[ 1_{D}(\Theta) \mid X^{-1}(\mathcal{B}) \right] = \mathrm{E} \left[ 1_{D}(\Theta)) \mid X^{-1}(\mathcal{B}_{T}) \right] \nonumber \\ & \Leftrightarrow & \mu(\Theta^{-1}(D) \mid X^{-1}(\mathcal{B})) = \mu(\Theta^{-1}(D) \mid X^{-1}(\mathcal{B}_{T})) . \label{chap02_01_lemma_02_06_condition} \end{eqnarray}\]

It is easy to show that condition (i) in B.73 is equivalent to condition (ii) in lemma 2.6. Indeed, the following equation holds

\[\mu(\Theta^{-1}(D) \mid X^{-1}(\mathcal{B}))(s) = \mu(\Theta^{-1}(D) \mid X = X(s)) .\]

Thus, the conditions are equivalent.

We only need to show that condition (ii) in lemma B.73 and condition (i) in lemma 2.6 are equivalent.

\[\begin{eqnarray} D \in \tau, \ C \in \sigma(T), \ \int_{T^{-1}(C)} \mu(\Theta^{-1}(D) \mid T \circ X = T(x)) \ \mu_{X}(d x) & = & \int_{C} \mu(\Theta^{-1}(D) \mid T \circ X = t) \ \mu_{X \circ T}(d t) \nonumber \\ \int_{T^{-1}(C)} \mu_{\mid \sigma(\Theta)}(\Theta^{-1}(D) \mid X^{-1}(\mathcal{B}_{T}))(ds) \ \mu_{X}(d s) & = & \int_{C} \mu(\Theta^{-1}(D) \mid T \circ X = t) \ \mu_{X \circ T}(d t) \nonumber \\ \int_{C} \mu(\Theta^{-1}(D) \mid T \circ X = t) \ \mu_{X \circ T}(d t) & = & \int_{X^{-1}(T^{-1}(C))} 1_{\Theta^{-1}}(ds) \ \mu(d s) & = & \nonumber \\ & = & \int_{T^{-1}(C)} \mu_{\mid \sigma(\Theta)}(\Theta^{-1}(D) \mid X = x) \ \mu(d x) \nonumber \\ \int_{X^{-1}(T^{-1}(C))} 1_{\Theta^{-1}}(ds) \ \mu(d s) & = & \int_{X^{-1}(T^{-1}(C))} \mu_{\mid \sigma(\Theta)}(\Theta^{-1}(D) \mid X^{-1}(\mathcal{B}))(ds) \ \mu(d s) \nonumber \end{eqnarray}\]

Then we have.

\[\begin{eqnarray} \mu_{\mid \sigma(\Theta)}(\Theta^{-1}(D) \mid X = x) & = & \mu_{\mid \sigma(\Theta)}(\Theta^{-1}(D) \mid T \circ X = T(x)) \end{eqnarray}\]

$\Box$

Example 2.7

$\mathcal{P}_{0} := \{P_{\theta}\}_{\theta \in \Omega}$,
- IID exponential distributions
- $P_{\theta} := \mathrm{Exp}(\theta)$,
$\{X_{i}\}_{i \in \mathbb{N}}$,
$X = (X_{1}, \ldots, n)$,

\[\frac{ d \mu_{\Theta \mid X} }{ d \mu_{\Theta} }(\theta \mid x) = \frac{ \theta^{n} \exp \left( - \theta \sum_{j=1}^{n} x_{j} \right) }{ \int_{} \psi^{n} \exp \left( - \psi \sum_{i=1}^{n} x_{i} \right) \ \mu_{\Theta}(d\psi) } .\]

Let $T(x) := \sum_{i=1}^{n} x_{i}$. By lemma 2.6, $T$ is sufficinet for $\Theta$.

■

Definition 2.8

$(\Omega, \tau)$,
- paramter sp
$\mathcal{P}_{0} := \{P_{\theta}\}_{\theta \in \Theta}$,
- parametric family of distributions on $(\mathcal{X}, \mathcal{B})$,
$\Theta: \mathcal{P}_{0} \rightarrow \Omega$,
$T: \mathcal{X} \rightarrow \mathcal{T}$,
- statistic

Suppose that there eixst versions of $P_{\theta}(\cdot \mid T)$ and a function $r: \mathcal{B} \times \mathcal{T} \rightarrow [0, 1]$ such that

\[r(\cdot, t): \mathcal{B} \rightarrow [0, 1]\]

is a probability for $t \in \mathcal{T}$. $r(A, \cdot)$ is measurable for $A \in \mathcal{B}$.

\[\forall \theta \in \Theta, \ B \in \mathcal{B}, \ P_{\theta}(B \mid T = t) = r(B, t) \ P_{\theta, T} \text{-a.e.}\]

Then

$T$ is a sufficient static for $\Theta$ in the classical sense.

■

Example 2.9

Continuation of Example 2.5

$X_{i}$,
- IID $\mathrm{Ber}(\theta)$ given $\Theta = \theta$,
- i.e. conditionally IID given $\Theta = \theta$ and conditonal distribution of $X_{i}$ given $\Theta = \theta$ is $\mathrm{Ber}(\theta)$
$X := (X_{1}, \ldots, X_{n})$,
$T(x) := \sum_{j=1}^{n}x_{i}$,

\[\begin{eqnarray} P_{\theta}^{\prime}(X = x \mid T(X) = t) & = & \frac{ \theta^{t}(1 - \theta)^{n - t} }{ \left( \begin{array}{c} n \\ t \end{array} \right) \theta^{t}(1 - \theta)^{n - t} } \nonumber \\ & = & \frac{ 1 }{ \left( \begin{array}{c} n \\ t \end{array} \right) } . \nonumber \end{eqnarray}\]

Thus, distribution of $r(\cdot, t)$ is discrete uniform distribution on $\{0, \ldots, \left( \begin{array}{c} n \\ t \end{array} \right)\}$. That is,

\[r(B, t) := \mathrm{card}(B) \frac{ 1 }{ \left( \begin{array}{c} n \\ t \end{array} \right) } .\]

■

Example 2.10

Continuation of Exmaple 2.7

$\mathcal{P}_{0} := \{P_{\theta}\}_{\theta \in \Omega}$,
- IID exponential distributions given $\Theta = \theta$,
- $P_{\theta} := \mathrm{Exp}(\theta)$,
$\{X_{i}\}_{i \in \mathbb{N}}$,
$X = (X_{1}, \ldots, n)$,

By Corollary B.55,

\[\begin{eqnarray} \frac{ f_{X \mid \Theta}(x_{1}, \ldots, x_{n} \mid \theta) }{ f_{T \mid \Theta}(t \mid \theta) } & := & \frac{ \theta^{n} \exp \left( - \theta t \right) }{ \frac{ 1 }{ (n - 1)! } \theta^{n}t^{n - 1} \exp \left( - t \theta \right) } \nonumber \\ & = & \frac{ (n - 1)! }{ t^{n - 1} } . \nonumber \end{eqnarray}\]

■