In the past 50+ years of convex optimization research, a great many
algorithms have been developed, each with slight nuances to their assumptions,
implementations, and guarantees. In this article, I'll give a shorthand
comparison of these methods in terms of the number of iterations required
to reach a desired accuracy
Below, methods are grouped according to what "order" of information they require about the objective function. In general, the more information you have, the faster you can converge; but beware, you will also need more memory and computation. Zeroth and first order methods are typically appropriate for large scale problems, whereas second order methods are limited to smalltomedium scale problems that require a high degree of precision.
At the bottom, you will find algorithms aimed specifically at minimizing supervised learning problems and other metaalgorithms useful for distributing computation across multiple nodes.
Unless otherwise stated, all objectives are assumed to be Lipschitz
continuous (though not necssarily differentiable) and the domain convex. The
variable being optimized is
Zeroth Order Methods
Zeroth order methods are characterized by not requiring any gradients or subgradients for their objective functions. In exchange, however, it is assumed that the objective is "simple" in the sense that a subset of variables (a "block") can be minimized exactly while holding all other variables fixed.
Algorithm  Problem Formulation  Convex  Strongly Convex  PerIteration Cost  Notes 

Randomized Block Coordinate Descent 
Applicable when 
First Order Methods
First order methods typically require access to an objective function's
gradient or subgradient. The algorithms typically take the form
Algorithm  Problem Formulation  Convex  Strongly Convex  PerIteration Cost  Notes 

Subgradient Descent  ...  Cannot be improved upon without further assumptions.  
Mirror Descent  Different parameterizations result in gradient descent and exponentiated gradient descent.  
Dual Averaging  ...  Cannot be improved upon without further assumptions.  
Gradient Descent 
Applicable when 

Accelerated Gradient Descent 
Applicable when 

Proximal Gradient Descent 
Applicable when 

Proximal Accelerated Gradient Descent 
Applicable when 

FrankWolfe Algorithm / Conditional Gradient Algorithm 
Applicable when 
Second Order Methods
Second order methods either use or approximate the hessian (
Algorithm  Problem Formulation  Convex  Strongly Convex  PerIteration Cost  Notes 

Newton's Method  ... 
Only applicable when 

Conjugate Gradient Descent  ... 
Converges in exactly 

LBFGS  ...  Between 
Applicable when 
Stochastic Methods
The following algorithms are specifically designed for supervised machine learning where the objective can be decomposed into independent "loss" functions and a regularizer,
The intuition is that finding the optimal solution to this problem is unnecessary as the goal is to minimize the "risk" (read: error) with respect to a set of samples from the true distribution of potential loss functions. Thus, the following algorithms' convergence rates are for the expected rate of convergence (as opposed to the above algorithms which upper bound the true rate of convergence).
Algorithm  Problem Formulation  Convex  Strongly Convex  PerIteration Cost  Notes 

Stochastic Gradient Descent (SGD)  Assumes objective is differentiable.  
Stochastic Dual Coordinate Ascent (SDCA)  
Accelerated Proximal Stochastic Dual Coordinate Ascent (APSDCA)  
Stochastic Average Gradient (SAG) 
Applicable when 

Stochastic Variance Reduced Gradient (SVRG) 
Applicable when 

MISO 
Applicable when 
Other Methods
The following methods do not fit well into any of the preceding categories.
Included are metaalgorithms like ADMM, which are good for distributing
computation across machines, and methods whose periteration complexity depends
on iteration count
Algorithm  Problem Formulation  Convex  Strongly Convex  PerIteration Cost  Notes 

Alternating Direction Method of Multipliers (ADMM) 

The stated convergence rate for "Strongly Convex" only requires 

Bundle Method  
Center of Gravity Algorithm  At least 
Applicable when 
Comments