2.1 Basic definitons and setups
,- we use this symbol for the upper bound of the
,- we use this symbol for the constant of Lipschitz function
,- diameter is bounded by
- convex
- compact
- diameter is bounded by
Definition
,- convex
,
where
Definition positive definite
, -matrix
, -matrix
We write
Proposition property of positive definite
, -matrix
, -matrix
,
(1) If
proof
(1)
Since
Definition alpha strongly convex and beta smooth
,- convex
, ,
Proposition
,- convex
,
Then the following conditions are equivalent;
(1)
(2)
(3)
proof
(1)
Let
Definition gamma-well-conditioned
,
- (1)
is -strongly convex - (2)
is -smooth convex
where
2.1.1 Projections onto convex sets
Definition. projection onto convex sets
,- convex sets
,
Projection onto convex sets are defined
Theorem 2.1
,- convex set
, ,
2.1.1 Introduction to optimality conditions
In this section, we assume
Theorem 2.2 KKT
,- convex set
,
Then
proof.
2.2 Gradient / subgradient descent
In this section, we prove the convergence rate of Gradient descent and Accelerated GD for each cases,
Gradient descent
- general convex
,
-strongly convex ,
-smooth convex ,
-well-conditioned ,
Accelerated GD
-smooth convex ,
-well-conditioned ,
2.2.1 Basic Gradient descent
ALgorithm 2 Gradient Decent Method
,- convex function
,- convex set
,- initial points
, ,- sequence of step sizes
Step1. For
Step2.
Step3. Go to Step2 until
Step4.
Lemma
,
proof
Then,
Theroem 2.3
, , -well-conditioned function
, ,
Then
proof
By strong convexity, for all
Hence
Thus,
Theorem 2.4
, , -well-conditioned function
, , ,- convex set
Then
proof
From
By algorithm’s definition, we have
To see this, letting
Thus,
where
For the second term,
Combining the above result,
Moreover, since
Thus, we can achieve the minimizer
Thus,
2.3 Reductions to non-smooth and non-strongly convex functions
2.3.1 Reductions to smooth and non-strongly convex functions
Algorithm 3 gradient descent, reduction to beta-smooth functions
, , , ,- parameter
Step1. Let
Step2. Apply Algorithm2 with parameters
Step3 return
Proposition propety of norm
,
- (1)
is convex. - (2)
is smooth and strongly convex.
proof
(1)
Hessian matrix of
(2)
Proposition alpha-strongnize
,- convex
,- convex function
,
Then
proof
Thus,
Proposition additivity of beta smooth
, , -smooth
, , -smooth
- (1)
is -smooth
proof
Lemma 2.5
, -smooth function,
- upper bound of diameter in
,
- upper bound of diameter in
Algorithm 3 with parameter
proof
By the previous proposition and proposition above,
Thus,
where
2.3.2 Reduction to strongly convex, non-smooth functions
Algorithm 4 Gradient descent reduction to non-smooth functions
, strongly convex -Lipschitz continuous
,- initial point
, ,
Step1 Apply algorithm 2 on
Lemma 2.6
,- Algorithm 4
Then
- (1) If
is -strongly convex, then so is , - (2)
is -smooth - (3)
for all ,- TODO: we need to chekc whether the statement is correct or not
proof
(1) TODO
The 3rd inequality follows from the fact that
(2) TODO
By Stokes theorem,
By proposition,
(3)
Lemma 2.7
Algorithm 4 converges as
proof
By lemma 2.6, the function
Hence
There is an algorithm for
Theorem 2.8
, -strongly convex
,- the iterates of Algorithm 2 appied to
with
- the iterates of Algorithm 2 appied to
Then
proof
2.3.3 Reduction to general convex function
We can apply both the technique
Then by proposition
However, there is an algorithm which gives the better inequality than the inequality from lemma 2.7, which is
2.4 Example: Support Vector Machine training
General linear classfication problem is stated as below.
,- the dimension of feature/input vector
, ,- input vector
,- label corresponding to input
,
- label corresponding to input
A lot of loss functions can be considered for this classfication problems, but here we consider one of the simplest loss function. The following equation is the problem to solve.
General classi
Support Vector Machine is pracitically used for classification.
The soft margin SV relaxation replaces the 0/1 loss in
Here we consider the loss function with regularization.
Algorithm 5 SVM training via subgradient descent
, , , , ,- regulariztion parameter
Step1.
Step2. Update
Step3. end for
Step4. Return
Theorem
In Algorithm 5, if