3.2 Sufficinecy and completeness
3.2 Sufficient statistics
- $(\mathcal{X}, \mathcal{A})$,
- measurable sp.
- $(\mathcal{T}, \mathcal{B})$,
- measurable sp.
- \(\Theta \subseteq \mathbb{R}\),
- \(\mathcal{P} := \{P_{\theta}\}_{\theta \in \Theta}\),
- family of measure over $(\mathcal{X}, \mathcal{A})$,
- $T: \mathcal{X} \rightarrow \mathcal{T}$
- statistics
$T$ is said to be sufficient if
\[\forall A \in \mathcal{A}, \ \exists q_{A}:\mathcal{T} \rightarrow \mathbb{R}: \mathcal{B} \text{-measurable } \ \text{ s.t. } \ \forall B \in \mathcal{B}, \ \theta \in \Theta, \ \int_{B} q_{A}(t) \ d P_{\theta}^{T}(dt) = P_{\theta}(A \cap T^{-1}(B))\]Remark
The definition of sufficiency is interpreted as the family of distributions can be expressed by the measurable funciton which does not depend on $\theta$. In other words, a statistics $T$ has sufficient information to determine a $\theta$. The definition of sufficient statistics is equivalent to
\[\forall A \in \mathcal{A}, \ \exists q_{A}: \mathcal{T} \rightarrow \mathbb{R}: \mathcal{B} \text{-measurable } \ \text{ s.t. } \ q_{A}(t) = P_{\theta}(A \mid T = t) \ t \text{-a.e.}\]Example 3.1 Bernoulli trial
- \(\mathcal{X} := \{0, 1\}^{n}\),
- \(\mathcal{A} := 2^{\mathcal{X}}\),
- \(\Theta := [0, 1]\),
- \(\mathcal{T} := \{0, 1, \ldots, n\}\),
- \(\mathcal{B} := 2^{\mathcal{T}}\),
We define p.d.f. of this trials as follows.
\[\theta \in \Theta, \ x \in \mathcal{X}, \ p_{\theta}(x) := \theta^{\sum_{i=1}^{n} x_{i}} (1 - \theta)^{n - \sum_{i=1}^{n} x_{i}},\]Corresponding probability measure is
\[A \in \mathcal{A} \ P_{\theta}(A) := \sum_{x \in A} p_{\theta}(x).\]We show that statistic
\[T(x) := \sum_{i=1}^{n} x_{i}\]is sufficient. To show that, we need to confirm the equation of definition of sufficiency. It is easy to see that
\[\begin{eqnarray} P_{\theta}(\{x\} \cap T^{-1}(\{t\}) = \begin{cases} \theta^{t}(1 - \theta)^{n - t} & x \in T^{-1}(\{t\}) \\ 0 & x \notin T^{-1}(\{t\}) \end{cases}. \label{example_sufficiency_lhs} \end{eqnarray}\]Hence
\[\begin{eqnarray} \forall A \in \mathcal{A}, \ P_{\theta}(A \cap T^{-1}(\{t\}) & = & P_{\theta}(A \cap T^{-1}(\{t\})) \nonumber \\ & = & \sum_{x \in A} P_{\theta}(\{x\} \cap T^{-1}(\{t\})) \nonumber \\ & = & |A \cap T^{-1}(\{t\}) | \theta^{t}(1 - \theta)^{n - t}. \nonumber \end{eqnarray}\]In particular, if we take $A$ as $\mathcal{X}$, we have
\[\begin{eqnarray} P_{\theta}(T^{-1}(\{t\}) & = & |T^{-1}(\{t\}) | \theta^{t}(1 - \theta)^{n - t}. \nonumber \\ & = & |\{x \in \mathcal{X} \mid \sum_{i=1}^{n} x_{i} = t \} | \theta^{t}(1 - \theta)^{n - t}. \nonumber \\ & = & \left( \begin{array}{c} n \\ t \end{array} \right) \theta^{t}(1 - \theta)^{n - t}. \nonumber \end{eqnarray}\]For evey $A$ we define $\mathcal{B}$ measurable function \(q_{A}\) by
\[\begin{eqnarray} r(x, t) := \begin{cases} \left( \begin{array}{c} n \\ t \end{array} \right)^{-1} & (T(x) = t) \\ 0 & \text{otherwise} \end{cases} \nonumber \end{eqnarray}\] \[q_{A}(t) := \sum_{x \in A} r(x, t) = |A \cap T^{-1}(\{t\})| \left( \begin{array}{c} n \\ t \end{array} \right)^{-1}\]Now we show that \(q_{A}\) satisfies the equality of the definition of sufficiency. LHS of the equality is
\[\begin{eqnarray} \int_{B} q_{A}(t) P_{\theta}^{T}(dt) & = & \sum_{t \in B} q_{A}(t) P_{\theta}^{T}(\{t\}) \nonumber \\ & = & \sum_{t \in B} q_{A}(t) \left( \begin{array}{c} n \\ t \end{array} \right) \theta^{t}(1 - \theta)^{n - t}. \nonumber \\ & = & \sum_{t \in B} |A \cap T^{-1}(\{t\})| \theta^{t}(1 - \theta)^{n - t}. \nonumber \end{eqnarray}\]RHS of the equality is
\[\begin{eqnarray} P_{\theta}(A \cap T^{-1}(B)) & = & \sum_{t \in B} P_{\theta}(A \cap T^{-1}(\{t\})) \nonumber \\ & = & \sum_{t \in B} |A \cap T^{-1}(\{t\}) | \theta^{t}(1 - \theta)^{n - t}. \end{eqnarray}\]Proposition 3.3
- \(\{P_{\theta}\}_{\theta \in \Theta}\),
- $f: \mathcal{X} \rightarrow \mathbb{R}$,
- measurable
- for all $\theta \in \Theta$, $P_{\theta}$-integrable
- $T: \mathcal{X} \rightarrow \mathcal{T}$,
- sufficient statistics
Then there exists measurable function $g:\mathcal{T} \rightarrow \mathbb{R}$l such that
\[\forall \theta \in \Theta, \ g(t) = \mathrm{E}_{\theta} \left[ f \mid T = t \right] \quad P_{\theta}^{T}\text{-a.s.}\]proof
Definition 3.11 Complete
- $(\mathcal{X}, \mathcal{A})$,
- measurable space
- $\Theta$
- parameter sp.
- ${P_{\theta}}_{\theta \in \Theta}$
- a family of probability distribution over $(\mathcal{X}, \mathcal{A})$.
$\mathcal{P}$ is said to be (boundedly) complete if for all (bounded) measurable function $f: \mathcal{X} \rightarrow \mathbb{R}$,
\[\begin{eqnarray} \mathrm{E}_{\theta} \left[ f \right] = 0 \ (\forall \theta \in \Theta) \ \Rightarrow \ \forall \theta \in \Theta f = 0 \ P_{\theta} \text{-a.s.} , \end{eqnarray}\]Remark
If $f \ge 0$, the above statemet always holds.
Definition 3.12 complete map
- $(\mathcal{T}, \mathcal{B})$,
- measurable sp.
- $f: \mathcal{X} \rightarrow \mathcal{T}$,
- measurable map
$T$ is said to be (boundedly) complete if \(\{P_{\theta}^{T}\}_{\theta \in \Theta}\) is (boundedly) complete.
Proposition 3.3
- \(\mathcal{P} := \{P_{\theta}\}_{\theta \in \Theta}\),
- $f: \mathcal{X} \rightarrow \mathbb{R}$,
- $\forall \theta \in \Theta$, $P_{\theta}$-integrable
- $T: \mathcal{X} \rightarrow \mathbb{R}$,
- sufficient with respect to $\mathcal{P}$,
Then there exists measurable function $g: \mathcal{T} \rightarrow \mathbb{R}$ such that
\[\forall \theta \in \Theta, \ g(t) = \mathrm{E}_{\theta} \left[ f \mid T = t \right] \quad P_{\theta}^{T} \text{-a.s.}\]proof
Proposition 3.4
- \(\mathcal{P} := \{P_{\theta}\}_{\theta \in \Theta}\),
- $f: \mathcal{X} \rightarrow \mathbb{R}$,
- $\forall \theta \in \Theta$, $P_{\theta}$-integrable
- $T: \mathcal{X} \rightarrow \mathbb{R}$,
- sufficient with respect to $\mathcal{P}$,
Then there exists $f^{\prime}: \mathcal{X} \rightarrow \mathbb{R}$ such that
\[\forall \theta \in \Theta, \ f^{\prime}(x) = \mathrm{E}_{\theta} \left[ f \mid T \right] \quad P_{\theta} \text{-a.s.}\]