Convex clustering is the reformulation of k-means clustering as a convex problem. While the two problems are not equivalent, the former can be seen as a relaxation of the latter that allows us to easily find globally optimal solutions (as opposed to only locally optimal ones).

Suppose we have a set of points

In this post, I'll talk about an algorithm to do just that.

# K-Means

The original objective for k-means clustering is as follows. Suppose we want
to find

In 2009, Aloise et al. proved that solving this problem is
NP-hard, meaning that short of enumerating every possible partition, we cannot
say whether or not we've found an optimal solution

# Convex Clustering

Convex clustering sidesteps this complexity result by proposing a new
problem that we *can* solve quickly. The optimal solution for this new problem
need not coincide with that of k-means, but can be seen a solution to
the convex relaxation of the original problem.

The idea of convex clustering is that each point

Notice that the distance

# Algorithms for Convex Clustering

As the convex clustering formulation is a convex problem, we automatically
get a variety of black-box algorithms capable of solving it. Unfortunately, the
number of variables in the problem is rather large -- if

Hocking et al. and Chi et al. were the first to design
algorithms specifically for convex clustering. The former designed one
algorithm for each

Here, I'll describe another method for solving the convex clustering problem
based on coordinate ascent. The idea is to take the original formulation,
substitute a new primal variable

# Problem Reformulation

To describe the dual problem being maximized, we first need to modify the
primal problem. First, let

Chi et al. show on page 6 that the dual of this problem is then,

In this notation,

# Coordinate Ascent

Now let's optimize the dual problem 1

We can pull

Let's define

We can now factor the objective into a quadratic,

This problem is simply a Euclidean projection onto the ball defined by

**Input:** Initial dual variables

- Initialize
$\Delta_i^{(0)} = \sum_{l: l_1 = i} \lambda_l^{(0)} - \sum_{l: l_2 = i} \lambda_l^{(0)}$ - For each iteration
$m = 0,1,2,\ldots$ until convergence- Let
$\Delta^{(m+1)} = \Delta^{(m)}$ - For each pair of points
$l = (i,j)$ with$w_{l} > 0$ - Let
$\Delta_i^{(m+1)} \leftarrow \Delta_i^{(m+1)} - \lambda_l^{(m)}$ and$\Delta_j^{(m+1)} \leftarrow \Delta_i^{(m+1)} + \lambda_l^{(m)}$ $\lambda_l^{(m+1)} = \text{project}(- \frac{1}{2}(\Delta_i^{(m+1)} - \Delta_j^{(m+1)} + x_{i} - x_{j}), \{ \lambda : ||\lambda||_{p^{*}} \le \gamma w_l \}$ )$\Delta_i^{(m+1)} \leftarrow \Delta_i^{(m+1)} + \lambda_l^{(m+1)}$ and$\Delta_j^{(m+1)} \leftarrow \Delta_j^{(m+1)} - \lambda_l^{(m+1)}$

- Let

- Let
- Return
$u_i = \Delta_i + x_i$ for all$i$

Since we can easily construct the primal variables from the dual variables and can evaluate the primal and dual functions in closed form, we can use the duality gap to determine when we are converged.

# Conclusion

In this post, I introduced a coordinate ascent algorithm for convex clustering. Empirically, the algorithm is quite quick, but it doesn't share the parallelizability or convergence proofs of its siblings, ADMM and AMA. However, coordinate descent has an upside: there are no parameters to tune, and every iteration is guaranteed to improve the objective function. Within each iteration, updates are quick asymptotically and empirically.

Unfortunately, like all algorithms based on the dual for this particular problem, the biggest burden is on memory. Whereas the primal formulation requires the number of variables grow linearly with the number of data points, the dual formulation can grow as high as quadratically. In addition, the primal variables allow for centers to be merged, allowing for potential space-savings as the algorithm is running. The dual seems to lack this property, requiring all dual variables to be fully instantiated.

# References

The original formulation for convex clustering was introduced by Lindsten et al. and Hocking et al.. Chi et al. introduced ADMM and AMA-based algorithms specifically designed for convex clustering.

## Comments