This paper evaluates the forecasting performance of Large Language Models (LLMs) in nowcasting French GDP growth. We compare their forecasts, obtained through zero-shot prompting without any external input data, to those produced by the econometric models currently used at the Banque de France. Our results indicate that traditional models outperform LLMs during stable periods, while LLMs prove more effective during exceptional episodes, such as the Covid-19 pandemic. We assess the influence of prompt design, language, model version, and temperature on forecast accuracy, and introduce novel indicators of forecast confidence and probability of GDP contraction derived from LLM outputs. Extensive robustness checks, including tests for information leakage and in-sample comparisons, confirm the validity of our findings. Overall, the results suggest that while standard LLMs are not yet a substitute for econometric models in routine forecasting, they offer useful complementary insights in periods of heightened uncertainty or structural change.
Link to the underlying paper
Website of Marie Bessec
