View on GitHub

memo

Chapter2-06. Characterization of Sufficiency

2.6 Characterization of Sufficiency

Definition. sufficient

$T$がsufficientであるとは、

\[\forall A \in \mathcal{A}, \ \exists q(A \mid t):\mathcal{T} \rightarrow \mathbb{R}: \mathcal{B} \text{-measurable } \ \text{ s.t. } \ \forall B \in \mathcal{B}, \ \theta \in \Theta, \ \int_{B} q(A \mid t) \ d P_{\theta}^{T}(dt) = P_{\theta}(A \cap T^{-1}(B))\]

がなりたつことを言う。

統計量が十分とは、分布族を$\theta$に依存しない可測関数で表現できるということである。 つまり、統計量$T$が$\theta$を特定するのに十分な情報を持っているということ。 上の定義は以下と同値である。

\[\forall A \in \mathcal{A}, \ \exists q(A \mid \cdot): \mathcal{T} \rightarrow \mathbb{R}: \mathcal{B} \text{-measurable }, \ \text{ s.t. } \ q(A \mid t) = P_{\theta}(A \mid T = t) \ t \text{-a.e.}\]

Example 2.4.1からorder statisticsはsufficient statisticである。

Example. Bernoulli trial

We define p.d.f. of this trials as follows.

\[\theta \in \Theta, \ x \in \mathcal{X}, \ p_{\theta}(x) := \theta^{\sum_{i=1}^{n} x_{i}} (1 - \theta)^{n - \sum_{i=1}^{n} x_{i}},\]

Corresponding probability measure is

\[A \in \mathcal{A} \ P_{\theta}(A) := \sum_{x \in A} p_{\theta}(x).\]

We show that statistic

\[T(x) := \sum_{i=1}^{n} x_{i}\]

is sufficient. To show that, we need to confirm the equation of definition of sufficiency. It is easy to see that

\[\begin{eqnarray} P_{\theta}(\{x\} \cap T^{-1}(\{t\}) = \begin{cases} \theta^{t}(1 - \theta)^{n - t} & x \in T^{-1}(\{t\}) \\ 0 & x \notin T^{-1}(\{t\}) \end{cases}. \label{example_sufficiency_lhs} \end{eqnarray}\]

Hence

\[\begin{eqnarray} \forall A \in \mathcal{A}, \ P_{\theta}(A \cap T^{-1}(\{t\}) & = & P_{\theta}(A \cap T^{-1}(\{t\})) \nonumber \\ & = & \sum_{x \in A} P_{\theta}(\{x\} \cap T^{-1}(\{t\})) \nonumber \\ & = & |A \cap T^{-1}(\{t\}) | \theta^{t}(1 - \theta)^{n - t}. \nonumber \end{eqnarray}\]

In particular, if we take $A$ as $\mathcal{X}$, we have

\[\begin{eqnarray} P_{\theta}(T^{-1}(\{t\}) & = & |T^{-1}(\{t\}) | \theta^{t}(1 - \theta)^{n - t}. \nonumber \\ & = & |\{x \in \mathcal{X} \mid \sum_{i=1}^{n} x_{i} = t \} | \theta^{t}(1 - \theta)^{n - t}. \nonumber \\ & = & \left( \begin{array}{c} n \\ t \end{array} \right) \theta^{t}(1 - \theta)^{n - t}. \nonumber \end{eqnarray}\]

For evey $A$ we define $\mathcal{B}$ measurable function \(q_{A}\) by

\[\begin{eqnarray} r(x, t) := \begin{cases} \left( \begin{array}{c} n \\ t \end{array} \right)^{-1} & (T(x) = t) \\ 0 & \text{otherwise} \end{cases} \nonumber \end{eqnarray}\] \[q_{A}(t) := \sum_{x \in A} r(x, t) = |A \cap T^{-1}(\{t\})| \left( \begin{array}{c} n \\ t \end{array} \right)^{-1}\]

Now we show that \(q_{A}\) satisfies the equality of the definition of sufficiency. LHS of the equality is

\[\begin{eqnarray} \int_{B} q_{A}(t) P_{\theta}^{T}(dt) & = & \sum_{t \in B} q_{A}(t) P_{\theta}^{T}(\{t\}) \nonumber \\ & = & \sum_{t \in B} q_{A}(t) \left( \begin{array}{c} n \\ t \end{array} \right) \theta^{t}(1 - \theta)^{n - t}. \nonumber \\ & = & \sum_{t \in B} |A \cap T^{-1}(\{t\})| \theta^{t}(1 - \theta)^{n - t}. \nonumber \end{eqnarray}\]

RHS of the equality is

\[\begin{eqnarray} P_{\theta}(A \cap T^{-1}(B)) & = & \sum_{t \in B} P_{\theta}(A \cap T^{-1}(\{t\})) \nonumber \\ & = & \sum_{t \in B} |A \cap T^{-1}(\{t\}) | \theta^{t}(1 - \theta)^{n - t}. \end{eqnarray}\]

Theorem 2.6.1

このとき、$\theta$に依存しない可測関数\(g:\mathcal{T} \rightarrow [0, 1]\)が存在して、

\[g(t) = P_{\theta}(A \mid T = t)\]

となる。

proof.

Theorem 2.5.1と同様の議論をすれば良い。

$\Box$

この定理はもう少し一般的な形で成り立つ。 詳細は数理統計学の命題3.3.をみる。

さて、Bernoulli trialの例で見たように、定義に基づく十分性の証明は煩雑である。 十分性の判定が容易となる同値な条件を与える。 まず必要な用語を定義する。

Definition. domination, absolutely continuous

\(P_{\theta}\) is absolutely continuous with respect to $\mu$ or dominated by $\mu$ if

\[\forall A \in \mathcal{A}, \ \mu(A) = 0 \Rightarrow P_{\theta}(A) = 0 .\]

Then we denote this as \(P_{\theta} \ll \mu\). We extend this definition to set of measures. \(\mathcal{P}\) is dominated by $\mu$ if $\mu$ satisfies

\[\begin{eqnarray} & & \forall \theta \in \Theta, \ \forall A \in \mathcal{A}, \ \mu(A) \Rightarrow P_{\theta}(A) = 0 \nonumber \\ & \Leftrightarrow & \forall \theta \in \Theta, \ P_{\theta} \ll \mu . \nonumber \end{eqnarray}\]

Lemma

Then

\[\exists \{P_{n}\}_{n \in \mathbb{N}} \subseteq \mathcal{P} \text{ s.t. } \mathcal{P} \ll P_{*} := \sum_{i = 1}^{\infty} 2^{-n}P_{n},\]

proof.

With out loss of generality, we assume that $\mu$ is finite measure. Indeed, if we assume that the statement holds for finite measure and $\mu$ is $\sigma$-finite measure, then there exists \(\{B_{n}\}_{n \in \mathbb{N}}\) such that

\[\mathcal{X} = \sum_{i \in \mathbb{N}} B_{n}, \ \mu(B_{n}) < \infty .\]

Here we define finite measure \(\mu^{*}\) by

\[\mu^{*} := \sum_{n \in \mathbb{N}} \frac{ \mu(A \cap B_{n}) }{ 2^{n}\mu(B_{n}) } 1_{\{\mu(A_{n}) \neq 0\}} .\]

Then \(\mathcal{P} \ll \mu^{*}\). Indeed,

\[\mu^{*}(A) = 0 \Leftrightarrow \forall n \in \mathbb{N}, \ \mu(A \cap B_{n}) = 0\]

and

\[\begin{eqnarray} \forall P \in \mathcal{P}, \ P(A) & = & P(A \cap \sum_{n \in \mathbb{N}} B_{n}) \nonumber \\ & = & \sum_{n \in \mathbb{N}} P(A \cap B_{n}) . \nonumber \end{eqnarray}\]

Hence by \(\mathcal{P} \ll \mu\), we obtain

\[\begin{eqnarray} \mu^{*}(A) = 0 & \Rightarrow & \forall n \in \mathbb{N}, \ \mu(A \cap B_{n}) = 0 \nonumber \end{eqnarray}\]

\(\mu^{*}\) is finite measure so that we obtain \(P_{*}\) by applying the statement for finite measure to \(\mu^{*}\).

We assume \(\mu\) is finite measure. Now we define

\[\begin{eqnarray} & & P \in \mathcal{P}, \ f_{P} := \frac{ d P }{ d \mu } \nonumber \\ & & S_{P} := \{ x \in \mathcal{X} \mid f_{P}(x) > 0 \} . \nonumber \end{eqnarray}\]

Radon-Nikodym derivative is \(\mathcal{A}\) measurable so that \(S_{P} \in \mathcal{A}\). And \(S_{P} \neq \emptyset\) since \(S_{P} = \emptyset\) implies

\[\begin{eqnarray} P(\mathcal{X}) & = & \int_{\mathcal{X}} f_{P}(x) \ dx \nonumber \\ & = & 0 . \nonumber \end{eqnarray}\]

Claim(a):

\[\begin{equation} P(A) = 0 \Leftrightarrow \mu(A \cap S_{P}) = 0 \label{lemma_03_26_step_b} \end{equation}\]

To prove the claim, for all $P \in \mathcal{P}$, we have

\[\begin{eqnarray} P(A) & = & \int_{A} f_{P}(x) \ dx \nonumber \\ & = & \int_{A} f_{P}(x) \ \mu(dx) \\ & = & \int_{A \cap S_{P}} f_{P}(x) \ \mu(dx) \nonumber \end{eqnarray}\]

This implies that

\[\begin{eqnarray} P(A) = 0 & \Leftrightarrow & 1_{A \cap S_{P}}(x) = 0 \ \mu \text{-a.e.} \nonumber \\ & \Leftrightarrow & \mu(A \cap S_{P}) = 0 . \nonumber \end{eqnarray}\]

Hence the claim is proved.

\[\mathcal{C} := \{ S \in \mathcal{A} \mid \exists \{P_{n}\}_{n \in \mathbb{N}} \in \mathcal{P}, \ S = \cup_{n \in \mathbb{N}} S_{P_{n}} \}\]

$\mathcal{C}$ is not empy set. For all $P \in \mathcal{P}$, \(S_{P} \in \mathcal{A}\) and we take a sequence by \(P_{n} := P\),

\[S_{P} = \bigcup_{n \in \mathbb{N}} S_{P_{n}} .\]

Moreover it is easy to check that $\mathcal{C}$ is closed with respect to countable sum since finite cartesian product of countable sets is stil countlable sets. We will show that there is the maximum value in \(\{\mu(S) \mid S \in \mathcal{C}\}\). Since $\mu$ is finite measure by our assumption,

\[D := \sup \{ \mu(S) \mid S \in \mathcal{C} \} < \infty .\]

We can take a sequence \(\{S^{i}\}_{i \in \mathbb{N}} \subseteq \mathcal{C}\) such that

\[\mu(S^{i}) \nearrow D \ (i \rightarrow \infty) .\]

Let \(S^{*} := \cup_{i \in \mathbb{N}} S^{i}\). Obviously, \(\mu(S^{*}) \le D\) by definiton of $D$. Additionally, by monotonicity of measure

\[\forall i \in \mathbb{N}, \ \mu(S^{i}) \le \mu(S_{*})\]

Hence by taking the limit both side,

\[D \le \mu(S_{*}) .\]

Therefore \(D = \mu(S^{*})\) and \(S^{*} \in \mathcal{C}\).

Now we construct \(P_{*}\) which satisfies the equation in the statement. \(S^{*} \in \mathcal{C}\) so that we can take a sequence \(\{P_{n}^{*}\}_{n \in \mathbb{N}}\) such that \(S = \cup_{n \in \mathbb{N}}S_{P_{n}^{*}}\). We define \(P_{*} := \sum_{n \in \mathbb{N}} 2^{-n}P_{n}^{*}\).

Claim: \(\mathcal{P} \ll P_{*}\).

Let $A \in \mathcal{A}$ be fixed. Suppose \(P_{*}(A) = 0\). We have

\[\begin{eqnarray} \forall P \in \mathcal{P}, \ P(A) & = & P(A \cap S^{*}) + P(A \cap (S^{*})^{c}) \nonumber \\ & \le & P(A \cap S^{*}) + P((S^{*})^{c}) . \label{lemma_03_26_inequality_for_claim} \end{eqnarray}\]

We only need to show

For (1),

\[\begin{eqnarray} P_{*}(A) = 0 & \Rightarrow & P_{n}(A) = 0 \ (\forall n \in \mathbb{N}) \quad (\because \text{definition of } P_{*}) \nonumber \\ & \Rightarrow & \mu(A \cap S_{P_{n}}) = 0 \ (\forall n \in \mathbb{N}) \quad (\because \eqref{lemma_03_26_step_b}) \nonumber \\ & \Rightarrow & \mu(A \cap S_{*}) = 0 \nonumber \\ & \Rightarrow & P(A \cap S_{*}) = 0 \ (\forall P \in \mathcal{P}) \quad (\because \mathcal{P} \ll \mu) \nonumber \end{eqnarray}\]

For (2), we show that \(\mu((S^{*})^{c} \cap S_{P}) = 0 \ (\forall P \in \mathcal{P})\). If we prove this, by \(\eqref{lemma_03_26_step_b}\) we obtain \(P((S^{*})^{c}) = 0\). Suppose that there exists $P \in \mathcal{P}$ such that \(\mu((S^{*})^{c} \cap S_{P}) > 0\), then

\[\begin{eqnarray} \mu(S^{*} \cup S_{P}) & = & \mu((S^{*} \cup S_{P}) \cap S^{*}) + \mu((S^{*} \cup S_{P}) \cap (S^{*})^{c}) \nonumber \\ & = & \mu(S^{*}) + \mu((S^{*})^{c} \cap S_{P}) \nonumber \\ & > & \mu(S^{*}) \end{eqnarray}\]

This is contradictition since \(\mu(S^{*})\) is the maximum value in \(\mathcal{C}\).

$\Box$

We prove equivalent condition of sufficiency. This is known as factorization theorem.

Thereom 2.6.2.

\[\mathcal{P} \ll \mu\]

Then the following statemets are equivalent:

\[\begin{equation} \forall \theta \in \Theta, \ \frac{ d P_{\theta} }{ d \mu } (x) = g_{\theta}(T(x)) h_{\theta}(x) \ \mu \text{-a.e.} \label{factorization_theorem_factorization} \end{equation}\]

proof.

(a) $\Rightarrow$ (b) By definition of sufficiency,

\[\begin{eqnarray} \forall A \in \mathcal{A}, \ \exists q_{A}(t) \ \text{ s.t. } \ \forall B \in \mathcal{B}, \ \int_{B} q_{A}(t) \ dP_{\theta}^{T}(dt) & = & P_{\theta}(A \cap T^{-1}(B)) \nonumber \\ & = & \int_{\mathcal{X}} 1_{A \cap T^{-1}(B)}(x) P_{\theta}(dx) \nonumber \\ & = & \int_{\mathcal{X}} 1_{A}(x)1_{B}(T(x)) P_{\theta}(dx) \nonumber \end{eqnarray}\]

Since $\mathcal{P} \ll \mu$, by lemma, there exists a sequence \(\{\theta_{n}\}_{n \in \mathbb{N}}\) such that

\[\mathcal{P} \ll P_{*} := \sum_{n \in \mathbb{N}} 2^{-n}P_{\theta_{n}} .\]

We have

\[\begin{eqnarray} \forall n \in \mathbb{N}, \ \int_{B} q_{A}(t) \ \left( \sum_{i=1}^{n} 2^{-i} P_{\theta_{i}}^{T}(dt) \right) & = & \int_{\mathcal{X}} 1_{A}(x)1_{B}(T(x)) \left( \sum_{i=1}^{n} 2^{-i} P_{\theta_{i}}(dx) \right) . \end{eqnarray}\]

By taking the limit of the above equation as $n \rightarrow \infty$, we obtain

\[\begin{eqnarray} \int_{B} q_{A}(t) \ dP_{*}^{T}(dt) & = & \int_{\mathcal{X}} 1_{A}(x)1_{B}(T(x)) P_{*}(dx) . \label{theorem_02_06_02_equation_02} \end{eqnarray}\]

Hence for all \(P_{*}^{T}\) integrable function $g$,

\[\begin{eqnarray} \int_{\mathcal{T}} q_{A}(t) g(t) \ dP_{\theta}^{T}(dt) & = & \int_{\mathcal{X}} 1_{A}(x)g(T(x)) P_{\theta}(dx) . \label{theorem_02_06_02_equation} \end{eqnarray}\]

This can be proved by standard procedure for Lebesgue integrable function. We only give short sketch of this proof. For all \(P_{*}^{T}\) integrable function $g$ can be approximated by linear sum of indicator function, say \(\sum_{i=1}^{N} a_{i} 1_{B_{i}}(t)\) where \(a_{i} \in \mathcal{T}\) and \(B_{i} \in \mathcal{B}\). By linearity of integral,

\[\begin{eqnarray} \int_{\mathcal{T}} q_{A}(t) \sum_{i=1}^{N} a_{i} 1_{B_{i}}(t) \ P_{\theta}^{T}(dt) & = & \int_{\mathcal{X}} 1_{A}(x) \sum_{i=1}^{N} a_{i} 1_{B_{i}}(T(x)) P_{\theta}(dx) . \nonumber \end{eqnarray}\]

Then taking the limit of both side.

Since \(P_{\theta} \ll \mathcal{P}_{*}\), \(P_{\theta}^{T} \ll P_{\theta}\). Its Radon-Nikodym derivative is

\[g_{\theta}(t) := \frac{ d P_{\theta}^{T} }{ d P_{*}^{T} } (t)\]

We substitute the above into \(\eqref{theorem_02_06_02_equation}\), we obtain

\[\begin{eqnarray} P_{\theta}(A) & = & \int_{\mathcal{T}} q_{A}(t) \ P_{\theta}^{T}(dt) \nonumber \\ & = & \int_{\mathcal{X}} q_{A}(t) g_{\theta}(t) \ P_{*}^{T}(dt) \ (\because \text{ property of Radon-Nikodym}) \nonumber \\ & = & \int_{\mathcal{X}} 1_{A}(t) g_{\theta}(T(x)) \ P_{*}(dx) \quad (\because \eqref{theorem_02_06_02_equation_02}) \nonumber \\ & = & \int_{\mathcal{X}} 1_{A}(t) g_{\theta}(T(x)) \frac{ d P_{*} }{ d \mu } (x) \ \mu(dx) \nonumber \end{eqnarray}\]

Since $A$ is arbitrary, the above equation implies

\[\frac{ d P_{\theta} }{ d \mu } (x) = g_{\theta}(T(x)) \frac{ d P_{*} }{ d \mu } (x) \ \mu \text{-a.e.} .\]

We take \(g(x) := (dP_{*} / d\mu)(x)\). The proof is complete.

(b) $\Rightarrow$ (a) Let \(P_{*}\) be defined above. We show that

\[\forall P_{\theta} \in \mathcal{P}, \ \exists \tilde{H}_{\theta}: \mathcal{T} \rightarrow \mathbb{R}_{\ge 0}, \ \text{ s.t. } \ \frac{ d P_{\theta} }{ d P_{*} } (x) = \tilde{H}(T(x)) \quad P_{*} \text{-a.s.}\]

where \(\tilde{H}_{\theta}\) is $\mathcal{B}$ measurable. Let

\[g_{*}(t) := \sum_{n=1}^{\infty} 2^{-n}g_{\theta_{n}}(t) .\]

By \(\eqref{factorization_theorem_factorization}\),

\[\forall n \in \mathbb{N}, \ \forall A \in \mathcal{A}, \ \sum_{i=1}^{n} 2^{-i} P_{\theta_{i}}(A) = \sum_{i=1}^{n} 2^{-i} \int_{A} g_{\theta_{i}}(T(x)) h(x) \ \mu(dx) \quad \mu \text{-a.e.} .\]

Hence by taking the limit we obtain

\[\begin{equation} \forall A \in \mathcal{A}, \ P_{*}(A) = \int_{A} g_{*}(T(x)) h(x) \ \mu(dx) \quad \mu \text{-a.e.} . \label{factorization_theorem_03_08} \end{equation}\]

Let \(B_{*} := \{t \in \mathcal{T} \mid g_{*}(t) > 0\}\) and

\[\begin{equation} \tilde{g}_{\theta}(t) := \begin{cases} \frac{g_{\theta}(t)}{g_{*}}(t) & t \in B_{*} \\ 0 & t \notin B_{*} \end{cases} . \label{factorization_theorem_03_09} \end{equation}\]

Then

\[\tilde{g}_{\theta}(T(x)) = \frac{ d P_{\theta} }{ d P_{*} } (x) .\]

Indeed, for all $A \in \mathcal{A}$,

\[\begin{eqnarray} \int_{A} \tilde{g}_{\theta}(T(x)) \ P_{*}(dx) & = & \int_{A} \tilde{g}_{\theta}(T(x)) h(x) g_{*}(T(x)) \ \mu(dx) \quad (\because\ \eqref{factorization_theorem_03_08}) \nonumber \\ & = & \int_{A} g_{\theta}(T(x)) h(x) 1_{B_{*}}(T(x)) \ \mu(dx) \quad (\because\ \eqref{factorization_theorem_03_09}) \nonumber \\ & = & \int_{A} 1_{B_{*}}(T(x)) \ P_{\theta}(dx) \quad (\because\ \eqref{factorization_theorem_factorization}) \nonumber \\ & = & P_{\theta}(A \cap T^{-1}(B_{*})) \nonumber \\ & = & P_{\theta}(A) \nonumber \end{eqnarray}\]

The last equality can be verified

\[\begin{eqnarray} P_{*}(T^{-1}(B_{*}^{c})) & = & \int_{\mathcal{X}} 1_{B_{*}^{c}}(T(x)) h(x) g_{*}(T(x)) \ \mu(dx) \nonumber \\ & = & 0 \nonumber \end{eqnarray}\]

and \(\mathcal{P} \ll P_{*}\) implies

\[\forall \theta \in \Theta, \ P_{\theta}(T^{-1}(B_{*}^{c})) = 0 .\]

We define $\mathcal{B}$ measurable function by conditional probability with respect to \(P_{*}\),

\[\begin{eqnarray} A \in \mathcal{A}, \ q_{A}(t) := \mathrm{E}^{P_{*}} \left[ \left. 1_{A} \right| T = t \right] \nonumber \end{eqnarray}\]

By definition of conditional probability, for all \(P_{*}^{T}\) integrable function $f: \mathcal{T} \rightarrow \mathbb{R}$,

\[\int_{\mathcal{T}} q_{A}(t) f(t) \ P_{*}^{T}(dt) = \int_{\mathcal{X}} 1_{A}(x) f(T(x)) \ P_{*}(dx) .\]

In particular, we take \(f = 1_{B}\tilde{g}_{\theta}\) and

\[\begin{eqnarray} \forall B \in \mathcal{B}, \ \theta \in \Theta, \ \int_{B} q_{A}(t) \ P_{\theta}^{T}(dt) & = & \int_{T^{-1}(B)} 1_{A}(x) \ P_{\theta}(dx) \nonumber \\ & = & P_{\theta}(A \cap T^{-1}(B)) . \end{eqnarray}\]

Therefore $T$ is sufficient for $\mathcal{P}$.

$\Box$

Example of sufficient statistics.

\[\mathcal{P} := \left\{ \int_{\cdot} \cdots \int_{\cdot} \left( \frac{ 1 }{ \sqrt{2\pi} } \right)^{n} \prod_{i=1}^{n} \exp \left( - \frac{ 1 }{ 2 } (x_{i} - \theta)^{2} \right) \ d x_{1} \cdots \ d x_{n} \mid \theta \in \mathbb{R} \right\}\]

Then $T$ is sufficient for $\mathcal{P}$ since

\[\frac{ d P_{\theta} }{ d x } (x) = \left( \frac{ 1 }{ \sqrt{2\pi} } \right)^{2} \exp \left( -\frac{1}{2} \sum_{j=1}^{n} x_{j}^{2} \right) \exp \left( \theta T(x) - \frac{ n \theta^{2} }{ 2 } \right) .\]

Application of sufficiency

Proposition

If $T$ is sufficinet for $\mathcal{P}$, then there exists $g: \mathcal{T} \rightarrow \mathbb{R}$ such that

\[\forall \theta \in \Theta, \ g(t) = \mathrm{E} \left[ f \mid T = t \right] \quad P_{\theta}^{T} \text{-a.e.}\]

proof.

Without loss of generality, $f$ is nonnegative. We can take a sequence of nonnegative simple functions \(\{f_{n}\}_{n}\) such that \(f_{n} \nearrow f\). Let \(f_{n} = \sum_{j=1}^{k_{n}} c_{n, j} 1_{A_{n,j}}\) and

\[g_{n}(t) := \sum_{j=1}^{k_{n}} c_{n, j}q_{A_{n, j}}(t) .\]

Then

\[\begin{equation} g_{n}(t) = \mathrm{E} \left[ \left. f_{n} \right| T = t \right] \quad P_{\theta}^{T} \text{-a.s.} . \label{proposition_03_03_approximation} \end{equation}\]

We define

\[g(t) := \begin{cases} \lim \sup_{n \rightarrow \infty} g_{n}(t) & \lim \sup_{n \rightarrow \infty} g_{n}(t) < \infty \\ 0 & \lim \sup_{n \rightarrow \infty} g_{n}(t) = \infty \end{cases} .\]

$g$ is $\mathcal{B}$ measurable. By taking the limit of the both side of \(\eqref{proposition_03_03_approximation}\). we obtain the equation in the statement.

$\Box$

Corollary

If $T$ is sufficinet for $\mathcal{P}$, then there exists $f^{\prime}: \mathcal{X} \rightarrow \mathbb{R}$ such that

\[\forall \theta \in \Theta, \ f^{\prime}(x) = \mathrm{E}_{\theta} \left[ f \mid T \right] \quad P_{\theta} \text{-a.e.}\]

proof.

By the above proposition, there exists $g$ such that

\[g(t) = \mathrm{E} \left[ \left. f \right| T = t \right] .\]

We take \(f^{\prime} := g(T(x))\). Then

\[\begin{eqnarray} T^{-1}(B) \in \sigma(T), \ \int_{T^{-1}(B)} f^{\prime}(x) \ P(dx) & = & \int_{T^{-1}(B)} g(T(x)) \ P(dx) \nonumber \\ & = & \int_{T^{-1}(B)} \mathrm{E} \left[ \left. f \right| T = T(x) \right] \ P(dx) \nonumber \\ & = & \int_{B} f(t) \ P^{T}(dt) \nonumber \\ & = & \int_{A} \mathrm{E}_{\theta} \left[ \left. f \right| T = T(x) \right] \ P(dx) \end{eqnarray}\]
$\Box$

We consider statistical decision problem \((\mathcal{X}, \mathcal{A}, \mathcal{P}, \Theta, D, \mathcal{B}, L, \Delta)\). Additionally, we assume

Rao Blackwell’s theorem

Let

\[\theta \in \Theta, \ \mathrm{E} \left[ \|\delta| \right] < \infty\]

and

\[\delta_{0} := \mathrm{E} \left[ \left. \delta \right| T \right] .\]

Then

\[\forall \theta \in \Theta, \ R(\theta, \delta_{0}) \le R(\theta, \delta) .\]

proof.

\[\begin{eqnarray} L(\theta, \delta_{0}) & = & L \left( \theta, \mathrm{E}_{\theta} \left[ \left. \delta \right| T \right] \right) \nonumber \\ & \le & \mathrm{E}_{\theta} \left[ \left. L(\theta, \delta) \right| T \right] \nonumber \end{eqnarray}\] \[\begin{eqnarray} & & \mathrm{E} \left[ L \left( \theta, \mathrm{E}_{\theta} \left[ \left. \delta \right| T \right] \right) \right] \le \mathrm{E} \left[ \mathrm{E}_{\theta} \left[ \left. L(\theta, \delta) \right| T \right] \right] \nonumber \\ & \Leftrightarrow & \mathrm{E} \left[ L(\theta, \delta_{0}) \right] \le \mathrm{E} \left[ L(\theta, \delta) \right] \nonumber \\ & \Leftrightarrow & R(\theta, \delta_{0}) \le R(\theta, \delta) \end{eqnarray}\]
$\Box$

Rao-Blackwell’s theorem indicates that we need to only consider sufficient statistics.