1. # Frank-Wolfe Algorithm

Sat 04 May 2013

In this post, we'll take a look at the Frank-Wolfe Algorithm also known as the Conditional Gradient Method, an algorithm particularly suited for solving problems with compact domains. Like the Proximal Gradient and Accelerated Proximal Gradient algorithms, Frank-Wolfe requires we exploit problem structure to quickly solve a mini-optimization problem. Our …

2. # Accelerated Proximal Gradient Descent

Thu 25 April 2013

$$\def\prox{\text{prox}}$$ In a previous post, I presented Proximal Gradient, a method for bypassing the $$O(1 / \epsilon^2)$$ convergence rate of Subgradient Descent. This method relied on assuming that the objective function could be expressed as the sum of 2 functions, $$g(x)$$ and $$h(x)$$, with …

Fri 19 April 2013

$$\def\prox{\text{prox}}  In a [previous post][subgradient_descent_usage], I mentioned that one cannot hope to asymptotically outperform the O(\frac{1}{\epsilon^2})$$ convergence rate of Subgradient Descent when dealing with a non-differentiable objective function. This is in fact only half-true; Subgradient Descent cannot be beat using only first-order …

In the mid-1980s, Yurii Nesterov hit the equivalent of an academic home run. At the same time, he established the Accelerated Gradient Method, proved that its convergence rate superior to Gradient Descent ($$O(1/\sqrt{\epsilon})$$ iterations instead of $$O(1/\epsilon)$$), and then proved that no other first-order (that …
Not far from Gradient Descent is another first-order descent algorithm (that is, an algorithm that only relies on the first derivative) is Subgradient Descent. In implementation, they are in fact identical. The only difference is on the assumptions placed on the objective function we wish to minimize, $$f(x)$$. If …