Lukas Püttmann    About    Blog

Schelling's segregation model

I had a try at Schelling’s segregation model, as described on quant-econ.

In the model, agents are one of two types and live on (x,y) coordinates. They’re happy if at least half of their closest 10 neighbors are of the same types, else they move to a new location.

My codes are simpler than the solutions at the link, but I actually like them like this. In my codes, agents just move to a random new location if they’re not happy. In the quant-econ example they keep moving until their happy. And I just simulate this for fixed number of cycles, not until everyone is happy.

In Matlab:

n     = 1000;          % number of agents of one type
N     = 2*n;           % total number of agents
T     = 10;            % number of cycles

locs  = rand(N, 2);    % initial location
types = [ones(n, 1);   % generate two types
         zeros(n, 1)];
     
figure
set(gca, 'FontSize', 16)
scatter(locs((types == 1), 1), locs((types == 1), 2))
hold on
scatter(locs((types == 0), 1), locs((types == 0), 2))
title('Cycle 0')
print('test0', '-dpng')
hold off

for t = 1:T
    for i = 1:N
        % All other agents
        others = [locs(1:(i-1),:);          
                  locs((i+1):end,:)];

        % Distance to other agents
        dist = pdist2(locs(i,:), others)';  

        % Nearest other agents
        [~, ix] = sort(dist);
        nearestAgents = (ix <= 10);        

        % Neighbors of same type
        sameNeighbors = sum(types(i) == types(nearestAgents));
        
        % Happy if at least 5 of neighbors are same type
        isHappy = (sameNeighbors >= 5);

        % If not happy, then move to random new location
        if not(isHappy)
            locs(i,:) = rand(1, 2);         
        end
    end
    fprintf('Finished cycle %d/%d.\n', t, T)
    
    figure
    set(gca, 'FontSize', 16)
    scatter(locs((types == 1), 1), locs((types == 1), 2))
    hold on
    scatter(locs((types == 0), 1), locs((types == 0), 2))
    title(['Cycle ', num2str(t)])
    print(['test', num2str(t)], '-dpng')
    hold off
end

Which yields the following sequence of images:

Schelling segregation model animated simulation

The two groups separate quickly. Most of the action takes place in the first few cycles and after the remaining minority types slowly move away into their type’s area.

In the paper, Schelling emphasizes the importance of where agents draw their boundaries:

In spatial arrangements, like a neighborhood or a hospital ward, everybody is next to somebody. A neighborhood may be 10 percent black or white; but if you have a neighbor on either side, the minimum nonzero percentage of neighbors of either opposite color is fifty. If people draw their boundaries differently, we can have everybody in a minority: at dinner, with men and women seated alternately, everyone is outnumbered two to one locally by the opposite sex but can join a three-fifths majority if he extends his horizon to the next person on either side.

New working paper

We have a new working paper out with the title “Benign Effects of Automation: New Evidence from Patent Texts”. You can find it here. Any comments are much appreciated.

Heckman on Econtalk

James Heckman was recently interviewed by Russ Roberts on Econtalk which I quite enjoyed. Some bits:

(37:35) Heckman: […] What I worry about is what I think is more general, not just even about empirical work, is kind of the non-cumulative nature of a lot of work in economics.

[...]

In macroeconomics and other parts of economics there’s a practice called calibration. The calibrated models are models that are kind of looking at some old stylized facts that are putting together different pieces of data that are not mutually consistent. I mean, literally: you take estimates of this area, estimates of that area, and you assemble something that’s like a Frankenstein that then stalks the planet and stalks the profession, walking around. It’s got a labor supply parameter from labor economics and it’s got an output analysis study from Ohio, and on and on and on. And the out comes something–and sometimes a compelling story is told. But it’s a story. It’s not the data. And I think there’s a lack of discipline in some areas where people just don’t want to go to primary data sources.

[...]

But back in the 1940s at Chicago, there was a debate that broke out; and it was a debate really between Milton Friedman and Tjalling Koopmans. Although it wasn’t quite stated that way, it ended up that way. And that was this idea of measurement without theory. […] And so, it’s very appealing to say, ‘Let’s not let the theory get in the way. We have all the facts. We should look at facts. We should basically have a structure that is free of a lot of arbitrary theory and a lot of arbitrary structure. That’s very appealing. I would like it. The idea that we have is this purely inductive, Francis Bacon-like style–not the painter but the original philosopher. So, but the problem with that is, as Koopmans pointed out, and as people pointed out: that every fact is subject to multiple interpretations. You’ve got to place it in context.

[...]

So, people will say, ‘Let the facts speak for themselves.’ But in fact, the facts almost never fully speak for themselves. But they do speak.

(48:47) Heckman: Well, it’s–I think that’s a general process of aging. If you do empirical work as I do and you get into issues, you inevitably are confronted with your own failures of perception and your own blind sides. And I think–I think the profession as a whole is probably better, much better, now. I mean the whole enterprise is bigger to start with. You are getting a lot of diverse points of view. And the whole capacity of the profession to replicate, to simulate, to check other people’s studies, has become much greater than it was in the past. I think the big development that’s occurred inside economics, and it’s in economics journals and in the professional–that if people put out a study, except for having those studies based on proprietary data–that many studies essentially have to be out there and to be replicated. And it’s literally been the kiss of death for people not to allow others to replicate their data.

[...]

And I think that–yes, I think we’ve all come to recognize the limits of the data. But on the other hand, I think we should also be amazed at how much richer the data base is these days–how much more we can actually investigate. […] So I think the empirical side of economics is much healthier than it was, before–I mean long before, going back to the 1920s and 1930s. That was just a period with no data. So I think we have a better understanding of the economy than we did. And I think that’s still there. And I think we have better interpretive frameworks than we had out there. […]. I think these are things that we shouldn’t underlook, overlook, here, understate where we’ve come from. We’ve come a long way.

I found it interesting that Milton Friedman was apparently more on the “let the data speak” reduced-form side of the spectrum.

For a different perspective on similar issues, I also recommend the podcast with Joshua Angrist.

German incomes in 2014

Here’s a booklet by the German Statistical Office on incomes in Germany in 2014:

  • Mean gross income was 3441 euros for full time employees. I couldn’t find the median anywhere, but eyeballing the graph it looks to be about 2500 Euros.

    Income distribution Germany 2014

  • Income differences between East and West are still quite pronounced. Compare Hessen and Thüringen, for example. The following shows hourly gross incomes by states:

    Hourly gross pay by German state 2014

  • The minimum wage is the same across Germany, so how binding it is varies depending on the local income level. Here’s the minimum wage relative to mean income across states:

    Relative size minimum wage across German states 2014

  • 6% of gross hourly income differences between men and women cannot be explained by observable characteristics.
  • Incomes for women flatten after childbirth. The following are gross hourly incomes (blue for men, yellow for women, the black line is the average age of the mother at the birth of the first child):

    Income age profiles by Gender Germany 2014

  • Germany taxes households, not individuals which subsidizes families where only one parent works. Singles keep about 60% of their gross income and for families with two children and one working parent net incomes are about 70% of gross incomes.


Collected links

  1. A Fine Theorem on David Donaldson winning the John Bates Clark Medal:

    Donaldson’s CV is a testament to how difficult this style of work is. He spent eight years at LSE before getting his PhD, and published only one paper in a peer reviewed journal in the 13 years following the start of his graduate work. “Railroads of the Raj” has been forthcoming at the AER for literally half a decade, despite the fact that this work is the core of what got Donaldson a junior position at MIT and a tenured position at Stanford. Is it any wonder that so few young economists want to pursue a style of research that is so challenging and so difficult to publish? Let us hope that Donaldson’s award encourages more of us to fully exploit both the incredible data we all now have access to, but also the beautiful body of theory that induces deep insights from that data.

  2. Jonathan Taplin in the New York Times: “Is It Time to Break Up Google?”:

    At a minimum, these companies should not be allowed to acquire other major firms, like Spotify or Snapchat.

  3. Hunter Clark, Maxim Pinkovskiy, and Xavier Sala-i-Martin: “Is Chinese Growth Overstated?
  4. John J. Horton: “A Way to Potentially Harm Many People for Little Benefit”:

    I spent 5 years in the Army as a tank platoon leader & company executive officer, after 4 years at West Point. Of my active duty time, 15 months were spent in Iraq (Baghdad and Karbala). It was, without a doubt, the worst experience of my life—nothing else even comes close, and I got off easy.

  5. Nate Silver on whether polling errors have become more common and differences between Trump and Le Pen:

    Ironically, the same type of sloppy thinking that led people to underestimate the chances for the Trump and Brexit victories may lead them to overestimate Le Pen’s odds.

  6. Philip Guo: “Five Years After My Ph.D. Thesis Defense

Roy model

In David Autor’s lecture notes on the Roy model he walks us through the migration choice model by Borjas (1987). In this model, agents decide between staying in the source country or migrating to a host country. The log wages in the source country () and in the host country () are given by:

The wage shocks and are drawn from a multivariate normal distribution and are correlated. The agents know all of these values and wages don’t adjust.

In Matlab, let’s simulate a number of agents:

N = 5000;  % number agents
c = 0.5;   % correlation between wage shocks

% Draw wage shock (correlated across countries)
SigmaInd = [1 c; c 1];          
z = mvnrnd([0 0], SigmaInd, N);

sigma0 = 30; % standard deviation wages source country
eps0 = z(:,1) * sigma0;

sigma1 = 100; % standard deviation wages host country
eps1 = z(:,2) * sigma1;

% Wages in source country
mu0 = 100;
w0 = mu0 + eps0;

% Wages in host country
mu1 = mu0;
w1 = mu1 + eps1;

We leave the two means and equal and concentrate on the effect of the relative standard deviations and the correlation. Next, we impose a cost of emigrating that rises in the source country wage and then check which agent wants to emigrate:

cost = 0.3 * w0; % cost rises in home country wages

% Choice
ixMigrate = (w1 - w0 > cost);

We can then make the following plot:

Roy model, positive hierarchical sorting

Every dot is one agent. The x-axis shows their source country wages and the y-axis their host country wages. The cloud of dots is centered on (100, 100).

Agents marked red choose to emigrate and agents marked blue choose to stay. The slope of the line separating the red and blue dots is steeper, the higher cost of moving we pick.

Autor shows that there are three cases for migration. With the current settings in the simulation, we get positive hierarchical sorting. This comes about if the wage shocks are sufficiently positively correlated across countries and the wage distribution is more dispersed in the host country than in the source country. Then, only the most productive will migrate. Those who migrate have above-average wages in both the source and the host country.

We get negative hierarchical sorting, if we change sigma0 = 100 and sigma0 = 30:

Roy model, negative hierarchical sorting

The wage shocks still need to be positively correlated across countries, but now the wages in the host country are more compressed than in the source country. Now, only less productive agents will migrate and emigration acts as an insurance. In this case, the mean wages (of those who choose to emigrate) is below the average of 100 in both countries.

The last case is refugee sorting, where the wage shocks are negatively correlated, so agents are below the mean income in the source country, but above the mean income in the host country. Set c = -0.5, sigma0 = 100 and sigma1 = 100 to get:

Roy model, refugee sorting

Here, migrants go from below-average wages in the source country to above-average wages in the host country. This could be the case if highly productive people are suppressed in their home countries.

Autor concludes with:

The growing focus of empirical economists on applying instrumental variables to causal estimation is in large part a response to the realization that self-selection (i.e., optimizing behavior) plagues interpretation of ecological relationships. […] But instrumental variables are not the only answer to testing cause and effect with observed data. Self-selection also points to the existence of equilibrium relationships that should be observed in ecological data […], and these can be tested without an instrument. In fact, there are some natural sciences that proceed almost entirely without experimentation — for example, astrophysics. How do they do it? Models predict nonobvious relationships in data. These implications can be verified or refuted by data, and this evidence strengthens or overturns the hypotheses. Many economists seem to have forgotten this methodology.

Collected links

  1. A question from Chris Blattman’s midterm:

    Suppose, in 1900, Nate Silver wanted to build a model for predicting autocracy—that is, which countries in the world would end up more or less democratic in 2000. Knowing everything you know today, what do you think would be the five most influential variables that would help Nate predict dictatorship versus democracy? These can be historical, geographic, cultural, political, economic, or something else—it is entirely up to you. They just have to be 1900 or pre-1900 measures. And you must justify your choice of these five variables and link them to the readings or lecture material.

  2. Rachel Laudan “I’m a Happy Food Waster”:

    It would be wonderful if the “don’t waste” value never clashed with other values such as safety, health, taste, choice, respect, and financial sense.

    Life’s not like that. Values clash all the time. Behaving well as an adult means making choices about which values are most important.

  3. Michael Nielsen on the tradeoff between accuracy and desirability

  4. Ricardo Reis:

    On top of this, asking an active researcher in macroeconomics to consider what is wrong with macroeconomics today is sure to produce a biased answer. The answer is simple: everything is wrong with macroeconomics. […] Researchers are experts at identifying the flaws in our current knowledge and in proposing ways to fix these. That is what research is.

    [...]

    There is something wrong with a field when bright young minds no longer find its questions interesting, or just reproduce the thoughts of close-minded older members. There is something right with it when the graduate students don’t miss the weekly seminar for work in progress, but are oblivious of the popular books in economics that newspapers and blogs debate furiously and tout as revolutionizing the field.

Joseph Henrich on modern causal reasoning

[E]ducated Westerners are trained their entire lives to think that behaviors must be underpinned by explicable and declarable reasons, so we are more likely to have them at the ready and feel more obligated to supply “good” reasons upon request. Saying “it’s our custom” is not considered a good reason. The pressure for an acceptable, clear, and explicit reason for doing things is merely a social norm common in Western populations, which creates the illusion (among Westerners) that humans generally do things based on explicit causal models and clear reasons. They often do not.

This is by Joseph Henrich in his book “Secret of Our Success: How Culture Is Driving Human Evolution, Domesticating Our Species, and Making Us Smarter”.

He contrasts this world view with traditional societies that follow rituals derived from cultural evolution that have a purpose, but people don’t know what it is. Henrich discusses the example of a Fidjian island, where women avoid eating sharks and eal when pregnant. This makes sense, as it avoids a food poison that could threaten the baby. But when they asked the women why that is, they came up with various reasons and none were right.

Coding interviews

When interviewing for programmer positions, why do people have to solve algorithmic questions on whiteboards that have little resemblance to what people will do in their job?

Here is Gayle Laakmann McDowell in “Cracking the Coding Interview”:

  1. False negatives are acceptable, but false positives aren’t.

    [The firm is] far more concerned with false positives: people who do well in an interview but are not in fact very good.

  2. Problem-solving skills are valuable.
  3. Basic data structure and algorithm knowledge is useful.

    Other interviewers justify the reliance on data structures and algorithms by arguing that it’s a good “proxy.”

  4. Whiteboards let you focus on what matters.

    Whiteboards also tend to encourage candidates to speak more and explain their thought process. When a candidate is given a computer, their communication drops substantially.

  5. But it’s not for everyone or every company or every situation.

This actually reminds me of consulting interviews.

I picked up the book as I was curios what a programmer is expected to know. You might also like the book if you enjoy solving coding puzzles and the general interviewing advice is good. The book also has a nice introduction to the big O notion of computational complexity.

The examples are usually written in Java, but just knowing Matlab or some other language it’s easy to understand what the code does.