1. Frank-Wolfe Algorithm

    Sat 04 May 2013

    In this post, we'll take a look at the Frank-Wolfe Algorithm also known as the Conditional Gradient Method, an algorithm particularly suited for solving problems with compact domains. Like the Proximal Gradient and Accelerated Proximal Gradient algorithms, Frank-Wolfe requires we exploit problem structure to quickly solve a mini-optimization problem. Our …

  2. Why does L1 produce sparse solutions?

    Mon 22 April 2013

    Supervised machine learning problems are typically of the form "minimize your error while regularizing your parameters." The idea is that while many choices of parameters may make your training error low, the goal isn't low training error -- it's low test-time error. Thus, parameters should be minimize training error while remaining …

  3. Stochastic Gradient Descent and Sparse $L_2$ regularization

    Thu 10 May 2012

    Suppose you’re doing some typical supervised learning on a gigantic dataset where the total loss over all samples for parameter \(w\) is simply the sum of the losses of each sample \(i\), i.e.,

    $$ L(w) = \sum_{i} l(x_i, y_i, w) $$

    Basically any loss function you can think …