Lukas Püttmann    About    Blog

Phillips curve in the United States

Joseph Gagnon has written a blog post at the Peterson Institute about the Phillips curve in the United States.

Some economists have observed that the employment gap turned positive this year, but inflation has not increased. Gagnon argues that we should not be too quick to infer from this that the Phillips curve relationship has broken down, as high employment may take a while to raise prices. And he adds that this relationship is weaker in low inflation environments due to hesitancies to lower prices (i.e., nominal rigidities).

He kindly provides data and codes at the bottom of his codes, but let’s try to reproduce his figure using only public data from FRED.

In R, first install some packages:


Load the packages:


Now get some series on core (no food, no energy) personal consumption expenditures and on the unemployment rate from FRED.

To access data from FRED, we need to register an API key and insert it at api_key below.

api_key <- "yourkeyhere"
fred    <- FredR(api_key)

mt <- fred$series.observations(series_id = "PCEPILFE") %>%
  mutate(date = as.Date(date)) %>% 
  select(date, pce = value)

mt <- fred$series.observations(series_id = "UNRATE") %>%
  mutate(date = as.Date(date)) %>% 
  select(date, unrate = value) %>% 
  full_join(., mt)

These are monthly series, so aggregate them to quarterly values:

qt <- mt %>% 
  mutate(yq     = as.yearqtr(date),
         pce    = as.numeric(pce),
         unrate = as.numeric(unrate)) %>% 
  group_by(yq) %>% 
  summarize(pce    = mean(pce),
            unrate = mean(unrate))

Download quarterly CBO estimates for the natural rate of unemployment from FRED and calculate the employment gap as current employment rate minus natural rate.

df  <- fred$series.observations(series_id = "NROU") %>%
  mutate(yq = as.yearqtr(paste0(year(date), " Q", quarter(date)))) %>% 
  filter(yq <= "2017 Q4") %>% 
  select(yq, nru = value) %>% 
  mutate(nru = as.numeric(nru)) %>% 
  full_join(qt, nru) %>% 
  mutate(egap = nru - unrate)

Make a plot:

ggplot(df, aes(yq, egap)) +
  geom_hline(yintercept = 0, size = 0.3, color = "grey80") +
  theme_tufte(base_family="Helvetica", ticks=FALSE) +
  geom_line() +
  labs(title    = "US employment gaps",
       subtitle = "1949Q1–2017Q4",
       x        = "Employment gap",
       y        = "Quarters")
Employment gap in the United States, 1949-2017

We see that the employment gap was negative for a long time after the financial crisis but has recently turned positive. That’s the bit of data that Gagnon and others are arguing about.

Now we create the variables that Gagnon uses. We lag the employment gap by four quarters and calculate the inflation rate as the year-on-year change in the price level:

df <- df %>% 
  mutate(egap_l = dplyr::lag(egap, 4),
         cpi = 100*(pce - dplyr::lag(pce, 4)) / dplyr::lag(pce, 4))

Take the moving average of the last three years and detrend the inflation rate to get the “change in inflation”:

df <- df %>% 
  ungroup() %>% 
  mutate(cpi_ma = 
           rollapply(cpi, width=12, FUN=function(x) mean(x, na.rm=TRUE), 
                     by=1, by.column=TRUE, partial=TRUE, fill=NA, align="right"),
         cpi_ma = ifelse(is.nan(cpi_ma), NA, cpi_ma),
         cpi_change = cpi - cpi_ma)

Last, we check in which periods inflation (four quarters before) was above three percent. And we truncate the data to the same sample as in Gagnon’s analysis.

df <- df %>% 
  mutate(cpi_d = dplyr::lag(cpi, 4)) %>% 
  mutate(high_inflation = (cpi_d >= 3))

df <- df %>% 
  filter((yq >= "1963 Q1") & (yq <= "2017 Q3"))

Now we can reproduce Gagnon’s scatterplots:

ggplot(df, aes(egap_l, cpi_change, color = high_inflation)) +
  geom_hline(yintercept = 0, size = 0.3, color = "grey60", linetype = "dotted") +
  geom_vline(xintercept = 0, size = 0.3, color = "grey60", linetype = "dotted") +
  geom_point(alpha = 0.5, stroke = 0, size = 3) +
  theme_tufte(base_family="Helvetica", ticks=FALSE) +
  geom_smooth(method = "lm", se = FALSE, size = 0.4, show.legend = FALSE) +
  guides(color=guide_legend(title="Inflation >3%?")) +
  labs(title    = "US Phillips curves with low and high inflation",
       subtitle = "1963Q1–2017Q3",
       caption  = "Source: FRED and own calculations following Gagnon (2017) \"There Is No Inflation Puzzle\".",
       x        = "employment gap",
       y        = "change in inflation") +
  geom_text(data=subset(df, yq == "2017 Q3"),
            aes(egap_l, cpi_change, label=yq), vjust = -10, hjust = 2,
            show.legend = FALSE) +
  geom_segment(aes(x = -1.2, xend=df$egap_l[df$yq == "2017 Q3"], 
                   y = 1.85, yend=df$cpi_change[df$yq == "2017 Q3"]), 
               arrow = arrow(length = unit(0.1, "cm")),
               show.legend = FALSE)
US Phillips curves with low and high inflation 1963-2017

We see indeed that the negative relationship is negative, but more pronounced in the high inflation regime. And the most recent datapoint “2017 Q3” shows that – a year ago – the employment gap was still slighty negative with -0.16 and the change in inflation in 2017 Q3 was also negative with -0.20.

Collected links

  1. The Gentzkow-Shapiro Lab is on Github. Their RA manual is interesting and includes bits on PhD applications and a writing style guide.
  2. Soviet mosaics
  3. Greg Mankiw: “Getting People to Get Along, Even When They Disagree” (reading list)
  4. The Umbrelly Man
  5. Jeff Attwood (co-founder of Stack Overflow): “The Existential Terror of Battle Royale”:

    I would generally characterize my state of mind for the last six to eight months as … poor. Not just because of current events in the United States, though the neverending barrage of bad news weighs heavily on my mind, and I continue to be profoundly disturbed by the erosion of core values […].

    In times like these, I sometimes turn to video games for escapist entertainment. One game in particular caught my attention because of its unprecedented rise in player count over the last year.


    I absolutely believe that huge numbers of people will still be playing some form of this game 20 years from now.


    It’s hard to explain why Battlegrounds is so compelling, but let’s start with the loneliness. […] PUBG is, in its way, the scariest zombie movie I’ve ever seen, though it lacks a single zombie. It dispenses with the pretense of a story, so you can realize much sooner that the zombies, as terrible as they may be, are nowhere as dangerous to you as your fellow man.


    Battle Royale is not the game mode we wanted, it’s not the game mode we needed, it’s the game mode we all deserve. And the best part is, when we’re done playing, we can turn it off.

"Hidden Skewness"

In a recent paper (pdf), Ludwig Ensthaler, Olga Nottmeyer, Georg Weizsäcker and Christian Zankiewicz ask the following question:

  • Consider an asset that’s worth 10,000 euros. For 12 periods the value of either rises by 70% or falls by 60% with equal probability. Now consider these five buckets: 1) below 6,400, 2) 6,400-12,800, 3) 12,800-19,200, 4) 19,200-25,600 or 5) above 25,600.
  • The asset will be simulated for you and if you have guessed right and the asset ended up in your bucket, you get 20 euros. Else, you get nothing. Which bucket would you bet on?

My intuition would have been to go for the third bucket. But the right answer is the first bucket. The reason is that this question asks you what the most likely price for such an asset is, not for the average price. Ensthaler et al. offer the clue that one increase can’t compensate for one fall (0.4 · 1.7 = 0.68 < 1).

Let’s simulate this in R for many assets to see what this process is up to:


N     <- 5000 # number of assets
T_per <- 12   # number of periods

# Define matrix of asset prices
ap      <- matrix(NA, nrow = (T_per + 1), ncol = N)
ap[1, ] <- rep(100, N)  # initial asset price

# Simulation
for (i in 1:N) {
  for (t in 1:T_per) {
    is_up      <- runif(1, min=0, max=1) > 0.5
    ap[1+t, i] <- ap[t, i] * (1.7 * is_up + 0.4 * (1 - is_up))

Next we extract some tidy statistics about our assets:

ap           <- as_data_frame(ap)
colnames(ap) <- paste("Asset ", 1:N)

ap <- ap %>% 
  gather(key = asset, value = price) 
ap$period <- rep(1:(T_per + 1), N)

stats <- ap %>% 
  group_by(period) %>% 
  summarize(mean   = mean(price),
            median = median(price)) 

stats <- stats %>% 
  gather(key = series, value, -period)

Now we can plot how prices develop over the 12 periods:

ggplot() +
  geom_line(data = stats, 
            aes(x = period, y = value, color = series), 
            size = 0.8) +
  geom_hline(yintercept = 100, size = 0.5, 
             color = "black", linetype = "dashed") +
  labs(x        = "Periods",
       y        = "Asset value",
       title    = paste("Performance of", N, "assets after 12 periods"),
       subtitle = "Price increases by 70% or falls by 60% with the same probability.",
       caption  = "See: Ensthaler et al. (2016).") +

ggsave("hidden-skewness.jpg", width = 8, height = 4)

Which produces:

Hidden skewness: Simulating asset prices for several periods

All the assets start out at 100. The mean price of assets rises, but the median falls fast.

The authors conclude that people are bad at calculating compound interest rates and that we tend to neglect skewness.

Our paper: "Benign Effects of Automation: New Evidence from Patent Texts"

We’ve updated our paper on automation patents which you can find here. In short, we identify automation patents based on their patent texts and create this new dataset:

Animated map of automation patents in US commuting zones, 1976-2014


Reading Erik Brynjolfsson and Andrew McAfee’s “Second Machine Age” three years ago got us interested in the topic. We were looking for data on how automation has advanced over time and across industries, but none of the existing proxies quite satisfied us. The idea to use patents came after reading Acemoglu et al. (2014) for a class which made us aware of patents as a data source.

It was striking that (almost) no researchers so far had made use of the actual patent texts. Instead, people use patent metadata such as citations (see e.g. here and here).

We were lucky that Google provides a bulk download page for patents (also see these codes). One of the tasks that took us the longest time was to write a parser to extract and clean the text sections for all patents from the titles, abstracts and text bodies of the patents. That was tricky as those are 336 GB of text which makes storing and retrieving the documents an issue. The parsing step took a week on a server where eight cores ran in parallel.

We then trained a naive Bayes algorithm on a sample of patents that we classified ourselves (using these guidelines).

Here’s an example of such an automation patent:

xample of a patent document: The Automatic Taco Machine from 1996

So the patent with the name “Automatic Taco Machine” was invented by one Barry Brummet from California and is assigned to (owned by) Taco Bell. He applied for the patent in 1994 was granted it in 1996. It cites other patents such as this one. Using this data, we can also check who cited the “Automatic Taco Machine”. We find that up until 2010 it was cited a total of 11 times (for example by this, this and this patent).

Even in the title of the “Automatic Taco Machine” you have the word “automatic”, so this is an easy patent to classify. Other words in the patent text that were important in classifying it as automation were are “removable”, “storage”, “acceptable”, “support arm”, “assist”, “communicate”, “measures”, “processor” and 179 others.

Next, we checked where every patent is likely to be used (not invented). For this, we used Brian Silverman’s concordance tables. For our example, we find:

Which industries patent example can be used

So with 23% probability it’s used in “Eating places”, “7% in “Household appliances …” and 6% in “Department stores”. Looks plausible.

We then matched our industries to US commuting zones (with the CBP) and get the picture at the top of this blog post. It shows the number of automation patents that can be used by a single worker in each of these commuting zones. In the earlier years, the rust belt saw a lot of automation patents, but this has become much more spread out and diffuse over the years.

Our empirical results are the following:

  • Between 1976 and 2014, about 2 million automation patents (out of 5 million patents in total) were granted.
  • The share of innovation concerned with automation rose from 25% in 1976 to 67% in 2014.
  • There was more investment in robots and computers in industries with more automation patents. Those were also industries where in 1960 more people worked in routine tasks.
  • Local labor markets (commuting zones) in the US where more new automation patents could be used experienced increases in employment.
  • Automation led to a loss in manufacturing employment, but this was more than compensated by a rise in service sector employment.

If you’ve become curious about the paper, you can find it here.

Collected links

  1. A Fine Theorem on “Resetting the Urban Network”, by Michaels and Rauch (2017):

    So today, let’s discuss a new paper by Michaels and Rauch which uses a fantastic historical case to investigate this debate: the rise and fall of the Roman Empire.

    The Romans famously conquered Gaul – today’s France – under Caesar, and Britain in stages up through Hadrian (and yes, Mary Beard’s SPQR is worthwhile summer reading; the fact that Nassim Taleb and her do not get along makes it even more self-recommending!).

  2. How Reggaeton Became a Global Phenomenon on Spotify”. Contains a kind of event study on what happens to the number of listeners when an influential playlist includes a song.

  3. Chris Blattman: “Why I am not blogging anymore

  4. Philip Guo: “Programming as a Professor

  5. I enjoyed this: “How obsessive artists colorize old photos


I’m currently working with macroeconomic forecasts by banks and research institutes (CEF). When I checked out inflation expectations, the following figure caught my attention:

Inflation expectations in Japan, influenced by Abenomics

The black solid line is quarterly Japanese consumer price inflation (CPI). The gray areas are two standard errors above and below the mean forecast. (Forecaster dispersion is a reasonable proxy of how uncertain forecasters are.) I overlaid the 12-month ahead forecast with the subsequent realizations.

The jump in inflation expectations in 2013 (Abe won in December 2012) is quite striking. And that effect seems to have abated somewhat since then.