Many statistical and learning procedures rely on iterative solvers, e.g. gradient descent steps for minimizing an empirical risk criterion. It turns out that stopping the iterates before convergence ("early stopping") usually regularizes the problem and is thus recommendable from a statistical perspective while it also reduces computational costs considerably. The underlying statistical problem is to design early stopping rules that are completely data-driven and lead to optimal statistical estimation bounds. We clarify the basic ideas for spectral-cutoff projection methods with exact oracle inequalities and matching lower bounds under sequential information constraints. We then look at the nonlinear conjugate gradient or partial least squares algorithm for linear models and discuss how the basic ideas can be adapted to a modern greedy learning procedure.
Personal website of Markus Reiß