Welcome to OPTML++! This is the homepage for the research seminar plus reading group on: Optimization for Machine Learning (the (++) stands for related areas (statistics, signal processing, computer vision, robotics, information theory, engineering, among others).

OPTML++ covers topics from convex, nonconvex, continuous and combinatorial optimization. This reading group is a mixture of a research seminar, a journal club, and a problem solving group.

This is our second semester. While the aim for the first semester was to help participants gain a substantial foothold in OPTML++ or to enrich their existing expertise, this semester we will begin to focus more on research papers, open questions, and the sort, while covering important classical and contemporary material as need be.


We are resuming our weekly meeting beginning March 8th, 2016

Next Meeting: 5:15PM; Location: 32-D707 Apr 13, 2016

The admin, announcements, and discussion related to this group are on this mailing list. Please subscribe.


Date Topic Links
08 Mar 2016 Suvrit: Introduction, Plans, Metric Nearness Intro; Metric Nearness
13 Apr 2016 Suvrit: Nonconvex variance reduced gradient methods1; 2

Suggested Papers

  1. Rodomanov, Kropotov (ICML 2016). Superlinearly-Convergent Proximal Newton-type Method for the Optimization of Finite Sums

  2. De Sa, Re, Olukotun (ICML 2016). Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling

  3. F. Curtis (ICML 2016). A Self-Correcting Variable-Metric Algorithm for Stochastic Optimization ,Frank Curtis Lehigh University

  4. Reddi et al (Feb 2016). Fast Incremental Method for Nonconvex Optimization

  5. Reddi et al (Feb 2016). Stochastic Variance Reduction for Nonconvex Optimization

  6. Reddi et al (Jun 2015) Asynchronous variance reduced stochastic gradient

  7. Mania et al (2015). Perturbed Iterate Analysis for Asynchronous Stochastic Optimization

  8. De Sa et al (Oct 2015). Taming the Wild: A Unified Analysis of Hogwild!-Style Algorithms

  9. Wibisono et al (Mar 2016). A Variational Perspective on Accelerated Methods in Optimization

  10. Montanari (Mar 2016). A Grothendieck-type inequality for local maxima

  11. Agarwal, Bullins, Hazan (Mar 2016). Second order stochastic optimization in linear time

  12. Luo, Agarwal, Cesa-Bianchi, Langford (Feb 2016). Efficient Second Order Online Learning via Sketching

  13. H. Zhang, S. Sra (Feb 2016). First-order algorithms for geodesically convex optimization

  14. Y. Zhang, Lee, Jordan. (Oct 2015) ℓ1-regularized Neural Networks are Improperly Learnable in Polynomial Time methods

  15. Allen-Zhu, Orecchia. Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent

  16. Su, Boyd, Candes. A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights

  17. Taylor, Hendrickx, Glineur (2015). Exact Worst-case Performance of First-order Algorithms for Composite Convex Optimization

  18. Kim, Fessler (2014-16). Optimized first-order methods for smooth convex minimization


  • Basic convex analysis and optimization
    • Basics of convex sets and functions
    • Key convex functions
    • Constructing and recognizing convex functions
    • Some convex geometric problems
  • Scalable convex optimization
    • First-order methods
    • Parallel
    • Distributed
    • Online learning -- maybe Sasha Rakhlin can help!
    • Sublinear methods
    • Incremental gradient methods
    • Stochastic gradient methods
    • Large-scale Semidefinite programming
  • Scalable nonconvex optimization
    • online optimization
    • dictionary learning models and algorithms
    • deep learning algorithms
    • nonconvex SGD, incremental gradient, and other algorithms
  • Non-Euclidean optimization
    • Hadamard manifolds
    • Riemannian manifolds
    • Metric spaces of nonpositive curvature
    • Wasserstein metric based problems (probability, geometry)
    • Explicit examples, such as large scale Gaussian mixtures
  • Interplay of convex and combinatorial optimization
    • Submodular problems -- potentially led by Stefanie Jegelka
    • Max-flow, graph algorithms -- potentially led by Aleksander Mądry
    • Convex relaxations
  • Applications
    • Large-scale regression problems
    • Optimization in deep learning
    • Dictionary learning
    • Applications from machine learning, statistics
    • Applications from finance, signal processing, control, information theory, ...
  • Software, design and implementation
    • Data Science platforms: AWS, Azure, Spark, MLBase, Parameter Server, Dato
    • Lessons in implementing asynchronous optimization algorithms
    • Multicore CPU considerations
    • GPU and GPGPU considerations
    • Delay sensitive algorithms and design
    • Other useful software and hardware ideas
  • Sublinear algorithms, randomized methods
    • Randomized linear algebra
    • Property testing
    • Hashing, SimHash, dimensionality reduction, etc.
    • Approximation algorithms
  • Exploiting symmetry in optimization
    • Optimizing over permutations
    • Schur-convexity, majorization
    • Optimizing over groups
    • Scattering transforms, application to deep learning
  • Large-scale sampling, MCMC
    • Parallel Gibbs sampling procedures
    • Large-scale multivariate gaussians
    • MCMC via the optimization lens
    • Scalable determinantal sampling
  • Open problems in optimization, ML, statistics, Information theory
    • Entropy inequalities
    • Convexity conjectures
    • Global optimization problems