Geometry plays a deep role in optimization, be it for computational speed, algorithm design, or complexity theory. My research in geometric optimization comprises two main topics:

**Algorithms and complexity theory**for optimization on non-Euclidean spaces such as Riemannian manifolds, CAT(0) geometries, Wasserstein spaces, etc., inspired by usual theory of polynomial time convex optimization (which can be recovered for instance when the manifold curvature goes to zero).- New
**applications**of non-Euclidean optimization as well as scaling up old ones via geometric insights.

Selected publications

- S. Sra, R. Hosseini. Conic geometric optimization.
*SIAM J. Optimization,*2015. - H. Zhang, S. Sra. First-order methods for geodesically convex optimization.
*Conference on Learning Theory (COLT),*2016. - H. Zhang, S. Reddi, S. Sra. Riemannian SVRG: Fast stochastic optimization on Riemannian manifolds.
*NIPS,*2016. - R. Hosseini, S. Sra. An Alternative to EM for Gaussian Mixture Models: Batch and Stochastic Riemannian Optimization.
*arXiv,*2017 - S. Sra. On the Matrix Square Root via Geometric Optimization
*Electronic J. Linear Algebra,*2016. - P. Zadeh, R. Hosseini, S. Sra. Geometric mean metric learning.
*ICML 2016.*

Ancient and modern machine learning rely on solving a variety of nonconvex optimization problems, for instance:
\begin{eqnarray*}
\mathrm{(ERM)}\quad &&\min_x\ f(x) := \frac{1}{n}\sum_i f_i(x),\\
\mathrm{(Stochastic)}\quad &&\min_x\ f(x) := \mathbb{E}_\xi[F(x,\xi)],
\end{eqnarray*}
where all the function involved can be nonconvex (e.g., in deep learning).

My work on solving these problems focuses on the following key aspects:

My work on solving these problems focuses on the following key aspects:

- Design, analysis, and implementation of algorithms that obtain an approximate critical points provably fast.
- Methods that can provably escape saddle points rapidly (e.g., by mixing first and second-order information)
- General nonconvex problems with other types of structure that enables global optimality
- Design and development of distributed and parallel methods for the above problems.

Selected publications

- S. Sra. Scalable nonconvex inexact proximal splitting.
*NIPS,*2012. - S. Reddi, S. Sra, B. Poczos, A. Smola. Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization.
*NIPS,*2016. - S. Reddi, A. Hefny, S. Sra, B. Poczos, A. Smola. Stochastic variance reduction for nonconvex optimization.
*ICML,*2016. - S. Reddi, S. Sra, B. Poczos, A. Smola. Stochastic Frank-Wolfe Methods for Nonconvex Optimization.
*Allerton CC,*2016. - S. Reddi, S. Sra, B. Poczos, A. Smola. Fast incremental method for smooth nonconvex optimization.
*IEEE CDC,*2016.

- Interior point methods (new)
- Derivative free optimization (new)
- Stochastic convex optimization, constrained and unconstrained
- Parallel and distributed convex optimization
- Fast proximity solvers; Fenchel conjugates
- Semidefinite programming, and large scale conic programming.

Selected publications

- Á. Barbero, S. Sra. Modular proximal optimization with application to total variation regularization.
*arXiv,*2017. - Y. Wang, V. Sadhanala, W. Dai, W. Neiswanger, S. Sra, E. P. Xing Asynchronous Parallel Block-Coordinate Frank-Wolfe.
*ICML,*2016. - S. Sra, A. Wei Yu, M. Li, A. Smola. AdaDelay: Delay sensitive distributed stochastic convex optimization.
*AISTATS,*2016. - S. Reddi, A. Hefny, S. Sra, B. Poczos, A. Smola. Asynchronous variance reduced stochastic gradient descent.
*NIPS,*2015. - S. Reddi, A. Hefny, C. Downey, A. Dubey, S. Sra. Large-scale randomized-coordinate descent methods with non-separable linear constraints.
*UAI,*2015. - M. Wytock, S. Sra, Z. Kolter. Fast Newton methods for the group fused Lasso.
*UAI,*2014 - S. Azadi, S. Sra. Towards stochastic alternating direction method of multipliers.
*ICML,*2014 - S. Jegelka, F. Bach, S. Sra. Reflection methods for user-friendly submodular optimization.
*NIPS,*2013

Within the highly active field of deep learning, my research interests roughly split into three directions:

**Theory**: global optimality, stability, dynamics, and generalization.**Models**: representation learning, new training methods, generative deep models.**Applications**: in computer vision, materials science, healthcare, and other regimes.

Selected publications

- C. Yun, S. Sra, A. Jadbabie. Global optimality conditions for deep neural networks.
*arXiv,*2017. - C. Li, D. Alvarez-Melis, K. Xu, S. Jegelka, S. Sra. Distributional Adversarial Networks.
*arXIv,*2017. - A. Cherian, S. Sra, R. Hartley. Sequence Summarization Using Order-constrained Kernelized Feature Subspaces.
*arXiv,*2017. - Z. Mariet, S. Sra. Diversity Networks: Neural Network Compression Using Determinantal Point Processes.
*ICLR,*2016.

\[ f_\mu(z) := \sum_{S \subseteq \{1,\ldots,n\}} \mu(S)\prod_{i\in S}z_i.
\]
Important examples include: Determinantal Point Processes, Dual-Volume distribution, etc.
Our work here focuses on

Selected publications

- C. Li, S. Jegelka, S. Sra. Polynomial Time Algorithms for Dual Volume Sampling.
*arXiv,*2017. - C. Li, S. Jegelka, S. Sra. Fast DPP sampling for Nyström with application to kernel methods.
*ICML,*2016. - C. Li, S. Jegelka, S. Sra. Fast Mixing Markov Chains for Strongly Rayleigh Measures, DPPs, and Constrained Sampling.
*NIPS,*2016. - Z. Mariet, S. Sra. Fixd-point algorithms for learning determinantal point processes.
*Inf. Conf. on Machine Learning (ICML),*2015. - Z. Mariet, S. Sra. Kronecker Determinantal Point Processes.
*NIPS,*2015. - Z. Mariet, S. Sra. Elementary Symmetric Polynomial based Optimal Design.
*arXiv,*2017.

- Probability theory, convex geometry, Brunn-Minkowski theory
- Algebraic combinatorics, positivity questions, symmetric polynomials
- Noncommutative algebra, matrix inequalities
- Stable polynomials, hyperbolic polynomials, conic geometries
- Differential geometry, metric geometry

Selected publications

- S. Sra. Positive Definite Matrices and the S-Divergence.
*Proc. American Math. Society (PAMS),*2016. - S. Sra. On inequalities for normalized Schur functions.
*Eur. J. Combinatorics,*2016. - L. Borisov, P. Neff, S. Sra, C. Thiel. The sum of squared logarithms inequality in arbitrary dimensions.
*LAA,*2017. - W. Berndt, S. Sra. Hlawka-Popoviciu inequalities on positive definite tensors.
*LAA,*2015. - S. Sra. Logarithmic inequalities under an elementary symmetric polynomial dominance order.
*arXiv,*2015.

Computer Vision; Graphics; Healthcare; Materials Design; Statistics; etc

Selected publications

- P. Zadeh, R. Hosseini, S. Sra. Geometric mean metric learning.
*ICML 2016.* - S. Sra. Directional Statistics in Machine Learning: a Brief Review.
*arXiv,*2016 - A. Cherian, S. Sra. Riemannian dictionary learning and sparse coding for positive definite matrices.
*IEEE TNNLS,*2016. - R. Hosseini, S. Sra, L. Theis, M. Bethge. Inference and mixture modelling with the Elliptical Gamma Distribution.
*CSDA,*2016 - A. Wei Yu, W. Ma, Y. Yu, J. Carbonell, S. Sra. Efficient Structured Matrix Rank Minimization.
*NIPS,*2014 - A. Cherian, S. Sra, V. Morellas, N. Papanikolopoulos. Efficient nearest neighbors via robust sparse hashing.
*IEEE TIP,*2014. - A. Cherian, S. Sra, A. Banerjee, N. Papanikolopoulos. Jensen-Bregman LogDet Divergence for Efficient Similarity Computations on Positive Definite Tensors.
*IEEE TPAMI,*2013.