# Big data in macroeconomics

Lucrezia Reichlin presented yesterday at the EEA’s annual meetings “Big Data and Macro Econometrics”. (Here are some older slides from a similar talk.)

She recommends using a large number of macroeconomic series with dimension reduction, such as Lasso and Ridge regression. These methods are intuitively appealing and work well. Packages such as glmnet automatically choose a mix of these two methods based on cross-validation.

Unfortunately, there was no discussion on the difficulties of applying resampling methods with aggregate time series. In macro, the time series dimension of the data is always shorter than we would like it. You might have 50 years of data and if you’re lucky that comes in quarterly or monthly frequency. And even if you extend your series back a couple of decades or across countries, our number of observations doesn’t become very large.

Instead, it’s becoming easier to find more variables to describe the same economy. We can use consumer surveys, scanner data or scrape the web for a more detailed view of the economy, but our number of observations grow only slowly. And frankly, the opposite would be better: I would rather observe only 10 or 20 variables from one economy over a really long time (or equivalently from many similar economies) than hundreds or thousands of variables about only one economy.

The fact that our number of observations grows slowly limits the scope for slicing samples into training, cross-validation and test sets. Thus, the focus in macroeconometrics is a lot more on dimension reduction than it is on an unguided search for patterns.