View on GitHub

memo

Subdifferential

Subdifferential

Definition1

The subdifferential of $f$ at \(x_{0} \in \mathbb{R}^{n}\) is defined as

\[\partial_{x_{0}}f := \{ c \in \mathbb{R}^{n} \mid \forall x \in I, \ f(x) - f(x_{0}) \ge c^{\mathrm{T}} (x - x_{0}) \} .\]

An element of the subdifferential is called subderivative of $f$ at $x_{0}$.

Property

Proposition2

$\partial_{x_{0}} f$ is closed convex set.

proof.

Let \(y, z \in \partial_{x_{0}}f\) $\forall x \in I$, $\forall \lambda \in (0, 1)$,

\[\begin{eqnarray} (\lambda y + (1 - \lambda) z)^{\mathrm{T}} (x - x_{0}) & = & \lambda y^{\mathrm{T}} (x - x_{0}) + (1 -\lambda) z^{\mathrm{T}} (x - x_{0}) \nonumber \\ & \le & \lambda (f(x) - f(x_{0})) + (1 - \lambda) (f(x) - f(x_{0})) \nonumber \\ & = & (f(x) - f(x_{0})) . \end{eqnarray}\]

Since it is easy to confirm the set is closed, the proof completed.

$\Box$

Theorem3 Necessary and sufficient condition of optimality for convex function

Then (1) \(\Leftrightarrow\) (2).

proof.

(1) \(\Rightarrow\) (2)

By definition of subdifferential,

\[\forall x \in I, \ f(x) \ge f(x^{*}) + 0^{\mathrm{T}}(x - x^{*}) = f(x^{*}) .\]

(1) \(\Leftarrow\) (2)

As above discussion,

\[\forall x \in I, \ f(x) \ge f(x^{*}) = f(x^{*}) + 0^{\mathrm{T}}(x - x^{*}) .\]

Hence \(0 \in \partial_{x^{*}}f\).

$\Box$

Definiton4

\[T_{I}(x^{*}) := \{ h \in \mathbb{R}^{n} \mid \exists \delta > 0, \ \text{ s.t. } \ x^{*} + th \in I, \ \delta > \forall t > 0 \}\]

is said to be tangent cone of $I$.

Proposition5

Then following statements are equivalent

\[\forall h \in T_{I}(x^{*}) \quad h^{\mathrm{T}} \nabla f(x^{*}) \ge 0,\]

proof.

(1) $\Rightarrow$ (2)

Suppose that there exists \(h \in T_{I}(x^{*})\) such that \(h^{\mathrm{T}}\nabla f(x^{*}) < 0\). By taylor’s theorem,

\[\begin{eqnarray} & & f(x^{*} + th) - f(x^{*}) = th^{\mathrm{T}}\nabla f(x^{*}) + O(t^{2}|h|^{2}) \nonumber \\ & \Rightarrow & \exists \delta > 0, \ \forall t \in (0, \delta), \ f(x^{*} + th) - f(x^{*}) = th^{\mathrm{T}}\nabla f(x^{*}) < 0 \nonumber \end{eqnarray}\]

Thus,

\[\forall t \in (0, \delta), \quad f(x^{*} + th) < f(x^{*}) .\]

Since \(h \in T_{I}(x^{*})\),

\[\forall t \in (0, \delta), \quad x^{*} + th \in I .\]

Then for every neighborhood of \(x^{*}\) we can take strictly lower than \(f(x^{*})\). This contradicts (1).

(1) $\Leftarrow$ (2)

If $x \in I$, then \(h := x - x^{*} \in T_{I}(x^{*})\) since $I$ is convex. Thus, by assumption (2),

\[f(x) \ge f(x^{*}) + (x - x^{*}) \nabla f(x^{*}) \ge f(x^{*}) .\]
$\Box$

Definition6 Normal cone

\[N_{I}(x^{*}) := \{ x \in \mathbb{R}_{\ge 0}^{n} \mid \forall h \in T_{I}(x^{*}) \ h^{\mathrm{T}}x \ge 0 \},\]

is said to be normal cone.

Remark7

By theorem, \(x^{*}\) is minimizer of $f$ on I if and only if \(\nabla f(x^{*}) \in N_{I}(x^{*})\).

Example8 Polyhedral set

We show equivalent condition being minimizer of $f$ when $I$ is polyhedral set. Let $I$ be arbitrary polyhedral set, that is,

\[\begin{eqnarray} I & := & \{ x \in \mathbb{R}^{n} \mid Ax \le b \} \nonumber \\ & = & \{ x \in \mathbb{R}^{n} \mid a_{i}^{\mathrm{T}}x \le b_{i}, \quad i = 1, \ldots, m \} \end{eqnarray}\]

Corresponding tangent cone is polyhedral cone

\[\begin{eqnarray} T_{I}(x^{*}) & = & \{ h \in \mathbb{R}^{n} \mid \exists \delta > 0 \ \text{ s.t. } \ \delta > \forall t > 0, \ x^{*} + t h \in I, \} \nonumber \\ & = & \{ h \in \mathbb{R}^{n} \mid \exists \delta > 0 \ \text{ s.t. } \ \delta > \forall t > 0, \ a_{i}^{\mathrm{T}}x^{*} + ta_{i}^{\mathrm{T}} h \le b_{i}, \ i = 1, \ldots, m, \} \nonumber \\ & = & \{ h \in \mathbb{R}^{n} \mid \forall i = 1, \ldots, m, \ a_{i}^{\mathrm{T}}x^{*} = b_{i} \Rightarrow a_{i}^{\mathrm{T}} h \le 0 \} \label{tangent_cone_polyhedral_cone_case_equivalent_transformation} \\ & = & \{ h \in \mathbb{R}^{n} \mid a_{i}^{\mathrm{T}}h \le 0, \quad \forall i \in J(x^{*}) \} \nonumber \\ J(x^{*}) & := & \{ i \in \{1, \ldots, m\} \mid a_{i}^{\mathrm{T}}x^{*} = b_{i} \} \nonumber \end{eqnarray}\]

Indeed, it is obvious except for the equation \(\eqref{tangent_cone_polyhedral_cone_case_equivalent_transformation}\). So we only show that the equation \(\eqref{tangent_cone_polyhedral_cone_case_equivalent_transformation}\).

($\subseteq$)

Suppose that \(a_{i}^{\mathrm{T}}x^{*} = b_{i}\)

\[\delta > \forall t > 0, \ t a_{i} h \le 0\]

Then \(a_{i}^{\mathrm{T}}h \le 0\).

($\supseteq$)

Since \(x^{*} \in I\), so that \(a_{i}^{\mathrm{T}}x^{*} \le b_{i}\). If \(a_{i}^{\mathrm{T}}x^{*} = b_{i}\), then assumption says \(a_{i}^{\mathrm{T}} h \le 0\).

On the other hand, we suppose \(a_{i}^{\mathrm{T}}x^{*} < b_{i}\). It is easy to check \(a_{i}^{\mathrm{T}}x^{*} + ta_{i}^{\mathrm{T}}h \le b_{i}\) \((\forall i)\) holds if \(a_{i}^{\mathrm{T}}x^{*} = 0\) \((\forall i)\). Thus, we assume that there exists \(j = 1, \ldots, m\) such that \(a_{i}^{\mathrm{T}}h \neq 0\). Then

\[\delta := \frac{ \min \{ b_{j} - a_{j}^{\mathrm{T}}x^{*} \mid i = 1, \ldots, m, \ b_{j} \neq a_{j}^{\mathrm{T}}x^{*} \} }{ \max \{ a_{j}^{\mathrm{T}}h \mid j = 1, \ldots, m, \ a_{j}^{\mathrm{T}}h \neq 0 \} }\]

Hence \(\delta > \forall t > 0\),

\[\begin{eqnarray} a_{i}^{\mathrm{T}}x^{*} + ta_{i}^{\mathrm{T}}h & < & a_{i}^{\mathrm{T}}x^{*} + \min \{ b_{j} - a_{j}^{\mathrm{T}}x^{*} \mid i = 1, \ldots, m, \ b_{j} \neq a_{j}^{\mathrm{T}}x^{*} \} \nonumber \\ & \le & b_{j} \nonumber \end{eqnarray}\]

Therefore the equation \(\eqref{tangent_cone_polyhedral_cone_case_equivalent_transformation}\) holds.

Corresponding normal cone is

\[\begin{eqnarray} N_{I}(x^{*}) & = & \{ x \in \mathbb{R}_{\ge 0}^{n} \mid \forall h \in T_{I}(x^{*}), \quad h^{\mathrm{T}}x \ge 0 \} \nonumber \\ & = & \{ x \in \mathbb{R}_{\ge 0}^{n} \mid \forall h \in \mathbb{R}^{n}, \ \forall i \in J(x^{*}), \ a_{i}^{\mathrm{T}}h \le 0 \ \Rightarrow \ h^{\mathrm{T}}x \ge 0 \} \nonumber \\ & = & \{ x \in \mathbb{R}_{\ge 0}^{n} \mid \forall i \in J(x^{*}), \quad h^{\mathrm{T}}a_{i} \le 0 \ \Rightarrow \ h^{\mathrm{T}}x \ge 0 \} \nonumber \\ & = & \{ x \in \mathbb{R}_{\ge 0}^{n} \mid \lambda \in \mathbb{R}_{\ge 0}^{m}, \quad x = \sum_{i \in J(x^{*})} \lambda_{i}(-a_{i}) \} \quad (\because \text{homogeneous farkas's lemma}) \end{eqnarray}\]

See farkas’s lemma.

By remark, \(x^{*}\) is minimizer of $f$ if and only if

\[\exists \lambda \in \mathbb{R}_{\ge 0}^{m}, \ \text{ s.t. } \ \nabla f(x^{*}) = - \sum_{i \in J(x^{*})} \lambda_{i}^{*} a_{i}\]

Reference