Talk from Archives

Making and Evaluating Point Forecasts

Tillmann Gneiting (U. Heidelberg) | 17.10.2011

Single-valued point forecasts continue to be issued and used in almost all realms of science and society. Typically, competing point forecasters or forecasting procedures are compared and assessed by means of an error measure or scoring function, such as the absolute error or the squared error, that depends both on the point forecast and the realizing observation. The individual scores are then averaged over forecast cases, to result in a summary measure of the predictive performance, such as the mean absolute error or the (root) mean squared error. I demonstrate that this common practice can lead to grossly misguided inferences, unless the scoring function and the forecasting task are carefully matched.

Effective point forecasting requires that the scoring function be specified a priori, or that the forecaster receives a directive in the form of a statistical functional, such as the mean or a quantile of the predictive distribution. If the scoring function is specified a priori, the forecaster can issue an optimal point forecast, namely, the Bayes rule, which minimizes the expected loss under the forecaster's predictive distribution. If the forecaster receives a directive in the form of a functional, it is critical that the scoring function be consistent for it, in the sense that the expected score is minimized when following the directive. Any consistent scoring function induces a proper scoring rule for probabilistic forecasts, and a duality principle links Bayes rules and consistent scoring functions.

A functional is elicitable if there exists a scoring function that is strictly consistent for it. Expectations, ratios of expectations and quantiles are elicitable. For example, a scoring function is consistent for the mean functional if and only if it is a Bregman function. It is consistent for a quantile if and only if it is generalized piecewise linear. Similar characterizations apply to ratios of expectations and to expectiles. Weighted scoring functions are consistent for functionals that adapt to the weighting in peculiar ways. Not all functionals are elicitable; for instance, conditional value-at-risk is not, despite its popularity in quantitative finance.

Back