1. # Accelerated Proximal Gradient Descent

Thu 25 April 2013

$$\def\prox{\text{prox}}$$ In a previous post, I presented Proximal Gradient, a method for bypassing the $$O(1 / \epsilon^2)$$ convergence rate of Subgradient Descent. This method relied on assuming that the objective function could be expressed as the sum of 2 functions, $$g(x)$$ and $$h(x)$$, with …

In the mid-1980s, Yurii Nesterov hit the equivalent of an academic home run. At the same time, he established the Accelerated Gradient Method, proved that its convergence rate superior to Gradient Descent ($$O(1/\sqrt{\epsilon})$$ iterations instead of $$O(1/\epsilon)$$), and then proved that no other first-order (that …