Lukas Püttmann    About    Blog

Collected links

  1. Google Maps’s Moat, by Justin O’Beirne (who runs one of the best blogs I know):

    At the rate it’s going, how long until Google has every structure on Earth?

  2. John Horton simulates the new Goldsmith-Pinkham, Sorkin and Swift paper on Bartik instruments.
  3. Thomas Leeper:

    What have I learned from this? First, everything takes time. Coming up with compelling research takes time. Data collection takes time. Data analysis takes time. Writing takes time. Peer review takes time. Rejection takes time. Recovery from rejection takes time. Responding to reviewers takes time. Typesetting takes time. Email takes time. Writing this blog post takes time. Everything takes time. It’s been eight years but that time has brought a publication I’m quite proud of.

  4. Brian Hayes provides a readable introduction to net neutrality:

    In round numbers, the web has something like a billion sites and four billion users—an extraordinarily close match of producers to consumers. […] Yet the ratio for the web is also misleading. Three fourths of those billion web sites have no content and no audience (they are “parked” domain names), and almost all the rest are tiny. Meanwhile, Facebook gets the attention of roughly half of the four billion web users. Google and Facebook together, along with their subsidiaries such as YouTube, account for 70 percent of all internet traffic.

  5. 100 Jahre DIN - Ein Urgestein der deutschen Wirtschaft (in German)
  6. Where did all my probability go?

Growing and shrinking

Stephen Broadberry and John Joseph Wallis have written an interesting new paper (pdf). They argue that much of modern economic growth is due to not only a higher trend growth rate, but also due to fewer periods of shrinking.

It’s easy to reproduce some of their results using the nice maddison package.

Get some packages:


Let’s concentrate on some countries with good data coverage, keep only data since 1800 and calculate real GDP growth rates. We also add a column for decades, to be able to look at some statistics for those separately.

df <- maddison %>% 
  filter(aggregate == 0,
         iso2c %in% c("DE", "FR", "IT", "JP", "NL", "ES", "SE", "GB", "US",
                             "AU", "DK", "ID", "NO", "GR", "ZA", "BE", "CH", "FI", 
                             "PT", "AT", "CA"),
         year >= as.Date("1800-01-01")) %>%
  group_by(iso3c) %>% 
  mutate(gr = 100*(gdp_pc - dplyr::lag(gdp_pc, 1)) / dplyr::lag(gdp_pc, 1)) %>% 
  ungroup() %>% 
  drop_na(gr) %>% 
  mutate(decade = paste0((year(year) %/% 10) * 10, "-", 
                         (year(year) %/% 10) * 10 + 9))

Define a data frame with annotations for the World Wars:

df_annotate <- tibble(
  xmin = as.Date(c("1914-01-01", "1939-01-01")),
  xmax = as.Date(c("1918-01-01", "1945-01-01")),
  ymin = -Inf, ymax = Inf,
  label = c("WW1", "WW2")

Check out data for some countries:

df %>% 
  filter(iso2c %in% c("DE", "FR", "IT", "JP", "NL", "ES", "SE", "GB", "US")) %>%
  ggplot() + 
  geom_hline(yintercept = 0, size = 0.3, color = "grey80") +
  geom_rect(aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax),
            data = df_annotate, fill = "grey50", alpha = 0.25) +
  geom_line(aes(year, gr), size = 0.4) +
  facet_wrap(~country, scales = "free_y") +
  ggthemes::theme_tufte(base_family = "Helvetica", ticks = FALSE) +
  labs(x = NULL, 
       y = "Percentage points\n", 
       title = "Growth rate of real GDP per capita",
       subtitle = "1800-2010")
Real GDP growth rates for nine countries since 1800 with Maddison data

Growth rates wobble around a mean slightly larger than zero, as expected. There are some extreme events around the World Wars. But it’s not immediately apparent from looking at these figures if shrinking has become less. If anything, it’s macroeconomic volatility that has decreased for most of these countries.

Next, for each decade we calculate the number of periods that countries were growing and shrinking and the average growth rates in those periods.

stats <- df %>% 
  group_by(iso3c, decade) %>% 
  mutate(category = ifelse(gr > 0, "grow", "shrink"),
         gr_av = mean(gr, na.rm = TRUE)) %>% 
  group_by(decade, iso3c, country_original, country, category, gr_av) %>% 
  summarise(prds = n(),
            delta = mean(gr, na.rm = TRUE)) %>% 

stats <- stats %>% 
  left_join(stats %>% 
              group_by(iso3c, decade) %>% 
              mutate(totper = sum(prds))) %>% 
  mutate(frq = prds / totper) %>% 
  select(-prds, -totper) %>% 
  mutate(comp = frq * delta)

We’re interested in how much growing and shrinking years contributed to overall growth in a decade. So this is just the absolute value of changes by either shrinking and growing years divided by the total variation.

stats <- stats %>% 
  full_join(stats %>% 
              group_by(decade, iso3c, country_original, country, gr_av) %>% 
              summarise(tvar = sum(abs(comp)),
                        gr_comp = sum(comp))) %>% 
  mutate(contr = abs(comp) / tvar)
  mutate(dfac = factor(decade)) %>% 
  filter(decade != "2010-2019")

ggplot(stats, aes(dfac, comp, color = category, 
                  shape = category, fill = category)) +
  geom_hline(yintercept = 0, size = 0.3, color = "grey80") +
  geom_jitter(alpha = 0.5) +
  coord_flip() +
  ggthemes::theme_tufte(base_family = "Helvetica", ticks = FALSE) +
  scale_shape_manual(values = c(17, 25)) +
  scale_colour_manual(values = c("#00BFC4", "#F8766D")) +
  scale_fill_manual(values = c("#00BFC4", "#F8766D")) +
  labs(title = "Contributions of growing and shrinking",
       subtitle = "1800-2009, every point is a country-decade observation",
       y = "Contribution to real GDP growth",
       x = NULL,
       caption = "Source: Maddison Project and own calculations.") +
  scale_x_discrete(limits = rev(levels(stats$dfac)))

When we plot those contributions, we get the following picture:

Contribution of growing and shrinking

It looks as if the contributions of shrinking have clustered closer to zero since WW2.

Let’s check out the trends in the contribution of shrinking periods:

r <- stats %>% 
  mutate(trend = row_number()) %>% 
  filter(category == "shrink") %>% 
  group_by(iso3c, country_original, country) %>% 
  do(fitDecades = lm(contr ~ trend, data = .)) %>% 
  tidy(fitDecades, = TRUE)

And plot the regression coefficients of the trend line:

r %>% 
  filter(term == "trend") %>% 
  arrange(estimate) %>% 
  ggplot(aes(reorder(country, -estimate), estimate)) +
  geom_hline(yintercept = 0, size = 0.8, color = "grey10", linetype = "dotted") +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.3) +
  geom_point(size = 1.0) + 
  coord_flip() +
  ggthemes::theme_tufte(base_family = "Helvetica", ticks = FALSE) +
  labs(title = "Trend of the contribution of shrinking episodes",
       subtitle = "1800-2009 by decades",
       x = NULL,
       y = "OLS estimate and 95% confidence bands",
       caption = "Source: Maddison Project and own calculations.")
Contribution of growing and shrinking, regression coefficients

The trend was negative for many countries and for some there was no significant trend. None of the trends was significantly positive. It certainly looks as if episodes of growing output have become more important than episodes of shrinking episodes.

The question remains whether this might not just be mechanically driven by the fact that a higher trend real GDP growth rate reduces the probability of the growth rate hitting zero. And a fall in macroeconomic volatility would also make shrinking episodes less likely.


Broadberry, S. and J. J. Wallis (2017). “Growing, Shrinking, and Long Run Economic Performance: Historical Perspectives on Economic Development”, NBER Working Paper No. 23343. doi: 10.3386/w23343

Collected links

  1. John Coltrane – Alabama
  2. Is AI Riding a One-Trick Pony?” (through Kaiser Fung)
  3. British Brexit Secretary, David Davis:

    The assessment of that effect … is not as straightforward as many people think. And I’m not a fan of economic models, because they have all proven wrong. (link)

  4. Consumer Inflation Uncertainty
  5. Why Are Some Good Old Ideas Buried in the History of Statistical Graphics?
  6. Tom Breloff:

    Instead of rushing to Software 2.0, lets view neural networks in proper context: they are models, not magic.

  7. Roman Cheplyaka: “Explained variance in PCA
  8. Project-oriented workflow or how not to “SET YOUR COMPUTER ON FIRE 🔥.”
  9. Cat Person”, by Kristen Roupenian in The New Yorker. About which was written:

    This past weekend, the biggest story on social media was not about a powerful man who had sexually assaulted someone, or something the president said on Twitter. Charmingly, as if we were all at a Paris salon in the 1920s, everyone had an opinion about a short story.

  10. Beautiful Testing (seen here)
  11. Sam Altman thinks he can speek more freely in Beijing than in San Francisco. And Tyler Cowen has a good explanation.

VoxEU gobbledygook

We’ve written a VoxEU column for our patent paper, which you can find here.

Overview statistics

When I started writing this article I became curious how a typical VoxEU column looks like. So I scraped the archives and looked at some statistics. Here they are (as of November 15 2017):

  • After some cleaning, there are 5633 columns from January 2008 to November 2017.
  • The mean number of page reads of columns is 20,600 (median: 16,600).
  • The mean number of authors is 2.1 (median: 2). There are about 1800 single-authored columns.
  • The teaser text at the top of the columns contained 68 words on average (median: 66).
  • The main part of the column is a little harder to count, because it also contains tables, figure captions and references. When I just count all words before the first appearance of “References” in the text, I get a mean of 1383 words (median: 1327). That seems well within the recommended range of 1000-1500 words.
  • The most prolific writers have written up to 50 columns and the mean number of columns per author is 2.2 (median: 1).

Every column is assigned to one topic and several tags. I aggregated the 49 topics to one of 19 categories (e.g., I counted “EU institutions” and “EU policies” as “Europe” and “Microeconomic regulation” and “Competition policy” as “Industrial organisation”). This produces the following figure:

Categories of economic fiels in VoxEU columns, quarterly 2008-2017

Some observations:

  • The graph reflects the focus of The top categories are “International economics” (950), “Europe” (930), “Development” (590), “Financial markets” (580).
  • Microeconomic theory and econometrics are only rarely covered.
  • The spike of the “Europe” category around 2012 might be related to the euro area sovereign debt crisis around that time.
  • The topic “Frontiers of economic research” is a bit more vague.
  • “Labor” and “Economic history” columns have become more important and columns with the topic “Global crisis” have become rarer.

Measuring complexity of text in columns

One fun exercise I’ve run is inspired by this blog post by Julia Silge. She explains how to use a “Simple Measure of Gobbledygook” (SMOG) by McLaughlin (1969) to find out which texts are hard to read. This works by counting the average length of syllables per words that people write. Words with fewer syllables are seen as easier to understand. The SMOG value is meant to show how many years of education somebody needs to understand a text.

I’m running this analysis separately on the columns teaser texts and their main body. Our own teaser text has 16 polysyllable words in four sentences and we calculate the SMOG value like this:

The rest of the column has 251 polysyllables in 65 sentences, which yields a SMOG of 14.4.

The winner of the VoxEU teaser text with the lowest SMOG count is this column by Jeffrey Frankel. It has a SMOG of 6.4, so taking the measure literally we would expect a kid fresh out of primary school to be able to understand it.

The column with the lowest SMOG value in its main column text is this column by James Andreoni and Laura Gee. It has 147 polysyllables spread out over 79 sentences, which yields a SMOG of 10.9.

I won’t name any offenders, but the highest SMOG score is 26.8. Understanding that text would require the substantial amount of education such as: 12 (school) + 3 (undergrad) + 1 (master) + 5 (PhD) + 6 (assistant professor) to understand.

The overall average SMOG value is 14.8 on teasers and 16.0 on main columns texts. So it seems that economists write on a level that college graduates can understand. SMOG doesn’t vary much by field, but it takes the highest value (on full columns) in “Industrial organisation” (16.9), “Monetary economics” (16.4) and lowest in “Economic History” (15.8) and “Global Crisis” (15.7).

The SMOG on the two column parts has a correlation of 0.27.

SMOG values of teaser vs. main part of VoxEU column

The OLS line is flatter than the 45 degree line which is probably a sign of the more accessible language in the teasers.

Interestingly, when we compare articles’ SMOG values with the number of times the page was read, we get the following negative relationship:

Number of page reads against SMOG values of VoxEU columns

This also holds in a regression of log(page reads) on the SMOG values of both main text and teaser text, the number of authors, number of authors squared and dummies for the day of the week, quarter, year and – most importantly – the literature category (e.g. “Taxation”, “Financial markets” or “Innovation”). It’s not driven by outliers either and there is also a significantly negative relationship if I measure SMOG on the teasers only.

Writing columns that take an additional year of schooling to understand (SMOG + 1) is associated with 3 percent fewer page reads. Maybe that’s a reason to use fewer big words in our papers!

One explanation might be that more complex papers require the use of more big words. And that users on prefer clicking on articles that don’t sound too complicated. But better written papers might also just be inherently better in other dimensions. And because they’re more important, people read them more often.


McLaughlin, G. H. (1969). “SMOG Grading - a New Readability Formula”. Journal of Reading. 12(8): 639—646.

My 10 favorite books 2017

Here are the books that I most enjoyed reading in the last 12 months in reverse order:

  1. Just Mercy: A Story of Justice and Redemption”, by Bryan Stevenson. A book about the weakest in society and the weaknesses of society.
  2. Hitler’s Soldiers: The German Army in the Third Reich”, by Ben Shepherd. It find it hard to say I enjoyed this book, but it impressed me. Facts really stand out when Shepherd is able to put numbers on them. For example, did you know that, “During the winter of 1941-2, 360,000 Greeks died of famine”?
  3. Submission”, by Michel Houellebecq. What I found eerie is the psychological plausibility of the decisions in this story.
  4. Rules for Radicals”, by Saul Alinsky. “All issues are controversial.” More here and here.
  5. Folding Beijing”, by Hao Jingfang. A fantastic (in both senses of the word) short story about reality and inequality by a Chinese macroeconomist.
  6. Hans Fallada: Die Biographie”, by Peter Walther (in German). I really enjoyed reading Fallada’s book “Alone in Berlin” and Fallada’s life was equally interesting. This superb biography spares nothing by simplifying too much.
  7. Commonwealth”, by Ann Patchett. We accompany the lives of six siblings over several decades. I started not expecting to finish, but after the first chapter I couldn’t stop.
  8. Gorbachev: His Life and Times”, by William Taubman. So much was new to me which mostly just reveils my ignorance about the topic.
  9. At the Existentialist Café: Freedom, Being, and Apricot Cocktails”, by Sarah Bakewell. Beautifully written and insightful.
  10. Secret of Our Success: How Culture Is Driving Human Evolution, Domesticating Our Species, and Making Us Smarter”, by Joseph Henrich. I found this book mind-boggling and impressive with things to say about anything from culture, evolution, causal reasoning to religion.

Related posts:

Collected links

  1. Miles Kimball:

    At this point I know Jerome Powell only from news reports, though I was pleased to realize a few days ago that the locked-down Twitter account @jeromehpowell created in 2011 is following me, as well as several of my friends in the blogosphere. If he is reading this, let me say that I would be glad to craft a blog post on any topic he would like to hear my opinion on, and would be happy to do so while keeping his query confidential.

  2. Antibiotics
  3. Rediscovering Ancient Greek Music
  4. The cash comeback: Evidence and possible explanations

Phillips curve in the United States

Joseph Gagnon has written a blog post at the Peterson Institute about the Phillips curve in the United States.

Some economists have observed that the employment gap turned positive this year, but inflation has not increased. Gagnon argues that we should not be too quick to infer from this that the Phillips curve relationship has broken down, as high employment may take a while to raise prices. And he adds that this relationship is weaker in low inflation environments due to hesitancies to lower prices (i.e., nominal rigidities).

He kindly provides data and codes at the bottom of his codes, but let’s try to reproduce his figure using only public data from FRED.

In R, first install some packages:


Load the packages:


Now get some series on core (no food, no energy) personal consumption expenditures and on the unemployment rate from FRED.

To access data from FRED, you need to register an API key and insert below.

api_key <- "yourkeyhere"
fred    <- FredR(api_key)

mt <- fred$series.observations(series_id = "PCEPILFE") %>%
  mutate(date = as.Date(date)) %>% 
  select(date, pce = value)

mt <- fred$series.observations(series_id = "UNRATE") %>%
  mutate(date = as.Date(date)) %>% 
  select(date, unrate = value) %>% 
  full_join(., mt)

These are monthly series, so aggregate them to quarterly values:

qt <- mt %>% 
  mutate(yq     = as.yearqtr(date),
         pce    = as.numeric(pce),
         unrate = as.numeric(unrate)) %>% 
  group_by(yq) %>% 
  summarize(pce    = mean(pce),
            unrate = mean(unrate))

Download quarterly CBO estimates for the natural rate of unemployment from FRED and calculate the employment gap as current employment rate minus natural rate.

df  <- fred$series.observations(series_id = "NROU") %>%
  mutate(yq = as.yearqtr(paste0(year(date), " Q", quarter(date)))) %>% 
  filter(yq <= "2017 Q4") %>% 
  select(yq, nru = value) %>% 
  mutate(nru = as.numeric(nru)) %>% 
  full_join(qt, nru) %>% 
  mutate(egap = nru - unrate)

Make a plot:

ggplot(df, aes(yq, egap)) +
  geom_hline(yintercept = 0, size = 0.3, color = "grey80") +
  theme_tufte(base_family="Helvetica", ticks=FALSE) +
  geom_line() +
  labs(title    = "US employment gaps",
       subtitle = "1949Q1–2017Q4",
       x        = "Employment gap",
       y        = "Quarters")
Employment gap in the United States, 1949-2017

We see that the employment gap was negative for a long time after the financial crisis but has recently turned positive. That’s the bit of data that Gagnon and others are arguing about.

Now we create the variables that Gagnon uses. We lag the employment gap by four quarters and calculate the inflation rate as the year-on-year change in the price level:

df <- df %>% 
  mutate(egap_l = dplyr::lag(egap, 4),
         cpi = 100*(pce - dplyr::lag(pce, 4)) / dplyr::lag(pce, 4))

Take the moving average of the last three years and detrend the inflation rate to get the “change in inflation”:

df <- df %>% 
  ungroup() %>% 
  mutate(cpi_ma = 
           rollapply(cpi, width=12, FUN=function(x) mean(x, na.rm=TRUE), 
                     by=1, by.column=TRUE, partial=TRUE, fill=NA, align="right"),
         cpi_ma = ifelse(is.nan(cpi_ma), NA, cpi_ma),
         cpi_change = cpi - cpi_ma)

Last, we check in which periods inflation (four quarters before) was above three percent. And we truncate the data to the same sample as in Gagnon’s analysis.

df <- df %>% 
  mutate(cpi_l = dplyr::lag(cpi, 4)) %>% 
  mutate(high_inflation = (cpi_l >= 3))

df <- df %>% 
  filter((yq >= "1963 Q1") & (yq <= "2017 Q3"))

Now we can reproduce the scatterplots:

ggplot(df, aes(egap_l, cpi_change, color = high_inflation, 
               shape = high_inflation)) +
  geom_hline(yintercept = 0, size = 0.3, color = "grey50", linetype = "dotted") +
  geom_vline(xintercept = 0, size = 0.3, color = "grey50", linetype = "dotted") +
  geom_point(alpha = 0.5, stroke = 0, size = 3) +
  theme_tufte(base_family="Helvetica", ticks=FALSE) +
  geom_smooth(method = "lm", se = FALSE, size = 0.4, show.legend = FALSE) +
  labs(title    = "US Phillips curves with low and high inflation",
       subtitle = "1963Q1–2017Q3",
       caption  = "Source: FRED and own calculations following\nGagnon (2017) \"There Is No Inflation Puzzle\".",
       x        = "employment gap",
       y        = "change in inflation") +
  geom_text(data=subset(df, yq == "2017 Q3"),
            aes(egap_l, cpi_change, label=yq), vjust = -10, hjust = 2,
            show.legend = FALSE) +
  geom_segment(aes(x = -1.2, xend=df$egap_l[df$yq == "2017 Q3"], 
                   y = 1.85, yend=df$cpi_change[df$yq == "2017 Q3"]), 
               arrow = arrow(length = unit(0.1, "cm")),
               show.legend = FALSE) +
  scale_colour_manual(name = "Inflation:",
                      labels = c("< 3%", "> 3%"),
                      values = c("#F8766D", "#00BFC4")) +
  scale_shape_manual(name = "Inflation:",
                     labels = c("< 3%", "> 3%"),
                     values = c(17, 19))
US Phillips curves with low and high inflation 1963-2017

We see indeed that there’s a relationship, but it’s more pronounced in the high inflation regime. And the most recent datapoint “2017 Q3” shows that – a year ago – the employment gap was still slighty negative with -0.16 and the change in inflation in 2017 Q3 was also negative with -0.20.

Collected links

  1. The Gentzkow-Shapiro Lab is on Github. Their RA manual is interesting and includes bits on PhD applications and a writing style guide.
  2. Soviet mosaics
  3. Greg Mankiw: “Getting People to Get Along, Even When They Disagree” (reading list)
  4. The Umbrelly Man
  5. Jeff Attwood (co-founder of Stack Overflow): “The Existential Terror of Battle Royale”:

    I would generally characterize my state of mind for the last six to eight months as … poor. Not just because of current events in the United States, though the neverending barrage of bad news weighs heavily on my mind, and I continue to be profoundly disturbed by the erosion of core values […].

    In times like these, I sometimes turn to video games for escapist entertainment. One game in particular caught my attention because of its unprecedented rise in player count over the last year.


    I absolutely believe that huge numbers of people will still be playing some form of this game 20 years from now.


    It’s hard to explain why Battlegrounds is so compelling, but let’s start with the loneliness. […] PUBG is, in its way, the scariest zombie movie I’ve ever seen, though it lacks a single zombie. It dispenses with the pretense of a story, so you can realize much sooner that the zombies, as terrible as they may be, are nowhere as dangerous to you as your fellow man.


    Battle Royale is not the game mode we wanted, it’s not the game mode we needed, it’s the game mode we all deserve. And the best part is, when we’re done playing, we can turn it off.