Thu 11 April 2013

Not far from Gradient Descent is another first-order descent algorithm (that is, an algorithm that only relies on the first derivative) is Subgradient Descent. In implementation, they are in fact identical. The only difference is on the assumptions placed on the objective function we wish to minimize, $$f(x)$$. If …

Wed 10 April 2013

Gradient Descent is perhaps the most intuitive of all optimization algorithms. Imagine you're standing on the side of a mountain and want to reach the bottom. You'd probably do something like this,

1. Look around you and see which way points the most downwards
2. Take a step in that direction, then …
3. # Topic Models aren't hard

Mon 21 January 2013

In 2002, Latent Dirichlet Allocation (LDA) was published at NIPS, one of the most highly regarded conferences for research loosely labeled as "Artificial Intelligence". The next 5 or so years led to a flurry of incremental model extensions and alternative inference methods, though none have achieved the popularity of their …

4. # ADMM: parallelizing convex optimization

Sun 24 June 2012

In the previous post, we considered Stochastic Gradient Descent, a popular method for optimizing "separable" functions (that is, functions that are purely sums of other functions) in a large, distributed environment. However, Stochastic Gradient Descent is not the only algorithm out there.

So why consider anything else? First of all …

5. # Stochastic Gradient Descent and Sparse $L_2$ regularization

Thu 10 May 2012

Suppose you’re doing some typical supervised learning on a gigantic dataset where the total loss over all samples for parameter $$w$$ is simply the sum of the losses of each sample $$i$$, i.e.,

$$L(w) = \sum_{i} l(x_i, y_i, w)$$

Basically any loss function you can think …