1. Frank-Wolfe Algorithm

    Sat 04 May 2013

    In this post, we'll take a look at the Frank-Wolfe Algorithm also known as the Conditional Gradient Method, an algorithm particularly suited for solving problems with compact domains. Like the Proximal Gradient and Accelerated Proximal Gradient algorithms, Frank-Wolfe requires we exploit problem structure to quickly solve a mini-optimization problem. Our …

  2. Accelerated Proximal Gradient Descent

    Thu 25 April 2013

    \(\def\prox{\text{prox}}\) In a previous post, I presented Proximal Gradient, a method for bypassing the \(O(1 / \epsilon^2)\) convergence rate of Subgradient Descent. This method relied on assuming that the objective function could be expressed as the sum of 2 functions, \(g(x)\) and \(h(x)\), with …

  3. Proximal Gradient Descent

    Fri 19 April 2013

    \( \def\prox{\text{prox}} $ In a [previous post][subgradient_descent_usage], I mentioned that one cannot hope to asymptotically outperform the $O(\frac{1}{\epsilon^2})\) convergence rate of Subgradient Descent when dealing with a non-differentiable objective function. This is in fact only half-true; Subgradient Descent cannot be beat using only first-order …

  4. Accelerated Gradient Descent

    Fri 12 April 2013

    In the mid-1980s, Yurii Nesterov hit the equivalent of an academic home run. At the same time, he established the Accelerated Gradient Method, proved that its convergence rate superior to Gradient Descent (\(O(1/\sqrt{\epsilon})\) iterations instead of \(O(1/\epsilon)\)), and then proved that no other first-order (that …

  5. Subgradient Descent

    Thu 11 April 2013

    Not far from Gradient Descent is another first-order descent algorithm (that is, an algorithm that only relies on the first derivative) is Subgradient Descent. In implementation, they are in fact identical. The only difference is on the assumptions placed on the objective function we wish to minimize, \(f(x)\). If …

continue   →