Models and surprise

27 Feb 2018

Hadley Wickham is a statistician and programmer and the creator of popular R packages such as ggplot2 or dplyr. His status in the R community has risen to such mythical levels that the set of packages he created were called the hadleyverse (renamed to tidyverse).

In a talk, he describes what he considers a sensible workflow and explains the following dichotomy between data visualization and quantitative modeling:

But visualization fundamentally is a human activity. This is making the most of your brain. In visualization, this is both a strength and a weakness […]. You can see something in a visualization that you did not expect and no computer program could have told you about. But because a human is involved in the loop, visualization fundamentally does not scale.

And so to me the complementary tool to visualization is modeling. I think of modeling very broadly. This is data mining, this is machine learning, this is statistical modeling. But basically, whenever you’ve made a question sufficiently precise, you can answer it with some numerical summary, summarize it with some algorithm, I think of this as a model. And models are fundamentally computational tools which means that they can scale much, much better. As your data gets bigger and bigger and bigger, you can keep up with that by using more sophisticated computation or simply just more computation.

But every model makes assumptions about the world and a model – by its very nature – cannot question those assumptions. So that means: on some fundamental level, a model cannot surprise you.

That definition excludes many economic models. I think of the insights of models such as Akerlof’s Lemons and Peaches, Schelling’s segregation model or the “true and non-trivial” theory of comparative advantage as surprising.