Stochastic Convex Optimization - Learnability, Stability and Gradient Descent

CSCI-GA. 3033 Advanced Machine Learning Course Project

Shortcuts | Report | Slides

Overview

We consider learning in the general framework of stochastic convex optimization (SCO). First, we study fundamental questions in this area: When is a SCO problem learnable? Is empirical risk minimization enough to guarantee learnability in SCO, as in binary classification? What is the role of stability in learning algorithms for SCO? We cover the perhaps surprising answers to these questions as provided by Shalev-Shwartz et al. (2010) .

We then turn our attention to gradient descent (GD) and study its power as a learning algorithm within SCO. A recent result by Schliserman et al. (2024) shows that there are instances where GD requires $\Omega(\sqrt{d})$ samples to arrive at a solution of non-trivial test error ($d$ is the input dimension). A further improvement provided by Livni (2024) establishes that this lower bound is in fact $\Omega(d)$. Taken together, the results imply that there is no statistical benefit of GD in worst-case SCO. See more details in the report.