At first blush, ADMM and Proximal Gradient Descent
(ProxGrad) appear to have very little in common. The convergence analyses for
these two methods are unrelated, and the former operates on an Augmented
Lagrangian while the latter directly minimizes the primal objective. In this
post, we'll show that after a slight modification to ADMM, we recover the
proximal gradient algorithm applied to Lagrangian *dual* of the ADMM objective.

To be precise, we'll first make a slight modification to ADMM to construct another algorithm known as the Alternating Minimization Algorithm (AMA). We'll then show this algorithm is an instance of a more general technique for Variational Inequality problems called Forward-Backward Splitting (FOBOS). Finally, we'll show that ProxGrad is also an instance of FOBOS with the exact same form. We conclude that these two algorithms are equivalent.

# Alternating Minimization Algorithm

The Alternating Minimization Algorithm (AMA), originally proposed by
Paul Tseng in 1988, is an algorithm very similar to ADMM. In fact, the only
difference between these two methods is in the first step of each iteration.
Recall the pseudocode for ADMM; whereas ADMM minimizes the *Augmented*
Lagrangian with respect to *Non-Augmented* Lagrangian,

**Input** Step size

- For
$t = 0, 1, \ldots$ - Let
$x^{(t+1)} = \underset{x}{\text{argmin}} \quad L_{ 0}( x , z^{(t)}, y^{(t)} )$ - Let
$z^{(t+1)} = \underset{z}{\text{argmin}} \quad L_{\rho}( x^{(t+1)}, z , y^{(t)} )$ - Let
$y^{(t+1)} = y^{(t)} + \rho ( Ax^{(t+1)} + Bz^{(t+1)} - c )$

- Let

Notice the

# Variational Inequalties

To show the similarity between AMA and ProxGrad, we'll show that both algorithms are instances of Forward-Backward Splitting (FOBOS). Unlike other algorithms we've considered, FOBOS isn't about minimizing a real-valued objective function subject to constraints. Instead, FOBOS solves Variational Inequality problems, which we'll now describe.

Variational Inequality (VI) problems involve a vector-to-vector function

If

What is not covered in this setup, however, is the case when *subset* of
*monotone operator* if,

Now if

# Forward-Backward Splitting

Forward-Backward Splitting FOBOS is an algorithm for finding
a

$F(w) = \Psi(w) + \Theta(w)$ for monotone operators$\Psi$ and$\Theta$ .$\Psi(w)$ has exactly one value for each$w$ in its domain.

Given this, FOBOS will converge to a

**Input** Step sizes

- For
$t = 0, 1, \ldots$ - Let
$w^{(t+1/2)} = w^{(t)} - \rho_t \Psi(w^{(t)})$ - Let
$w^{(t+1)}$ be such that$w^{(t+1)} + \rho_t \Theta(w^{(t+1)}) = w^{(t+1/2)}$

- Let

An equivalent, more concise way to describe FOBOS is with

# Reductions to FOBOS

We'll now show that for the specific optimization problem tackled by ADMM, AMA is the same as Proximal Gradient Descent on the dual problem. First, recall the problem ADMM is solving,

The dual problem to this is then,

where

## Proximal Gradient Descent to FOBOS

Suppose we want to minimize

Let's now define,

Clearly,

To do this, let's recall the definition of the prox operator,

Since this is an unconstrained minimization problem, we know that

Apply

## AMA to FOBOS

We'll now show that AMA as applied to the ADMM objective is
simply an instance of FOBOS. Similar to the ProxGrad
reduction, we'll use the following definitions for

First, recall the subgradient optimality condition as applied to Step B of
ADMM (same as AMA). In particular, for

Using

We now left-multiply by

Now we multiply both sides by

We can invert

Now, let's apply the same subgradient optimality to Step A of AMA.

Using

Substituting in equation

Notice that this is exactly the same thing we concluded in the reduction from ProxGrad to FOBOS. Thus, we have shown that both AMA and ProxGrad are the same algorithm for the ADMM objective.

# References

**Proximal Gradient Descent and ADMM** I was first made aware of the
relationship between AMA and ADMM in Chi's article on
convex clustering via ADMM and AMA. The relationship between Proximal Gradient
Descent and FoBoS is taken from Berkeley's EE227a slides and
the relationship between FoBoS and AMA from Goldstein et
al's work on Accelerated ADMM and AMA.

## Comments