Playing with Medians

Why overly focussing on medians is dangerous.

I want do a short post to gently remind people of pitfalls when overly relying on medians for understanding complex issues. Medians are useful because they take a complex distribution and break it down into a single, simple to understand number. This works well as long as this does not mask other aspects of the distribution that are important in the context it is used.

A good example for the dangers of overly relying on medians is the “median multiple” metric that gets used a lot, the median dwelling value divided by the median household income in an area. I have talked about this before but want to give some more general context to illustrate some pitfalls more directly.

Median Household Income

When comparing median household incomes across regions it is very important to pay attention to the significant confounding variables involved. The variance in median income when restricted to various household types is huge as can easily be observed when interactively filtering by various household types. And those differences are persistent across geographic regions. What that means is that a significant determinant of the median overall household income in a region is the composition of households in that region.

A Tale of Two Cities

A good example of this is median household incomes in the City of Vancouver vs the City of Toronto. As has been reported repeatedly, the City of Toronto has higher median household income than the City of Vancouver. And while this was a reasonable way to summarize the income situation in 2005, it is misleading in the context of 2015. People that have been paying close attention to these issues will be nodding their heads and don’t need to read on. For the others this needs a more detailed and clear explanation.

To start off, let me explain what the problem is. Median household incomes are higher in Toronto than they are in Vancouver, so how can saying this be misleading?

One-person households and two-or-more person households have very different income profiles. We can compare median incomes separately for these groups. If we do that we find that City of Vancouver has higher median household income for one-person households as well as for two-or-more person households when compared to the City of Toronto. How can that be? Vancouver has a much higher proportion of one-person households, which tend to have significantly lower incomes. This is the classical Simpson’s Paradox.

Taking a more comprehensive look, let’s use cancensus to pull a number of relevant median income metrics for 2005-2006, 2010-2011 and 2015-2016 and compare how they changed over time.

Some metrics aren’t available for 2005-2006, but the overall picture becomes very clear. We see that in 2005-2006 Toronto had higher median incomes in all of these metrics, but by 2015-2016 only the median household income is higher in Toronto, and Vancouver scores higher in all finer income groups.

Different household composition seems to be the missing confounding factor, let’s pull these numbers to confirm this.

The difference is not huge, but enough to flip the overall median income ranking and cause the paradox.