Mountain Doodles

spare time data, analysis, visualization

Millennials Redux

| Comments

Catching up with my local news reading last night I stumbled about another new report on millennials.

The notion that millennials are fleeing Vancouver is a recurring theme in the Vancouver press, and we have addressed some of the problems in the data used to support that claim before.

Sadly, this new article’s use of data is no less problematic, and the topic, as well as the data misrepresentations, are serious enough that I felt they need addressing so as not do distract from the actual real problems that millennials are facing. Problems that are quite different from those the 25 to 39 year old age cohort was facing 20 years ago. Groups like Generation Squeeze have done a good job nailing some of that down in the data.

The Data Rabbit Hole Trap

To the data-minded person reading the article there are a number of red flags that go off throughout. Many of these can be attributed to today’s typical data-adverse journalism, but typically the actual hard numbers in the article hold up and are just misrepresented to varying degrees. What got me stumbling in this article was the data chart at the bottom claiming that the 25 to 39 year old age cohort in the “UEL” grew by 5% between 1996 and 2016. The UEL of course is a quasi-municipality that sits wedged between the City of Vancouver and UBC, but many people less attuned to data and administrative details use the term to refer to various portions of the region west of the Pacific Spirit Park, sometimes including “Little Australia” which is west of the Park but an actual part of the UEL and sometimes excluding it.

I took it to mean some version of UEL and UBC/UNA combined, and the 5% number looked suspicious to me. The population in that area more than tripled during that period, one would expect the change in that age cohort to be much larger. So I started to dig into the numbers.

The first step was to look up the numbers for the City of Vancouver since there are no issues with administrative boundaries between 1996 and 2016, just to make sure that the data was labelled correctly and it was really representing the percentage change in the number of millennials between 1996 and 2016. But the number I got was different from the one in the article. The article lists a 10% increase, I calculated an 11% increase. 11.2% to be precise, so there was no chance that this was a rounding issue.

And the data rabbit hole opens up, sucks me in and the trap closes.

The Data

The data is, for the most part, reasonably straight forward. I just grabbed the 1996 count of 25 to 39 year olds, then the 2016 counts and compared them. One problem is boundary changes. Administrative boundaries don’t stay fixed. And boundary changes don’t always show up when just looking at non-geographic data, names or even the uniqe geographic identifiers don’t necessarily change when census boundaries change. Looking that the geographic data for both censuses one immediately notices that the UEL/UBC/UNA area changed a lot (and also got new geographic identifiers), and Coquitlam changed too. That complicates things a little, the UEL/UBC/UNA part is easy enough to deal with. In 1996 that area was called the “University Endowment Area”, in 2016 that same area can be obtained by adding to census tracts. Coquitlam is a little trickier and I wasn’t interested enough in figuring out the details so I decided to ignore it.

Step one, trying to reproduce the graph in the newspaper, is below with blue bars, with the graph from the newspaper in red bars for reference.

There is definitely a correspondence between the graphs, but the numbers don’t quite match up. I have no idea how the “UEL” numbers were derived for the article. But I have an explanation for the difference in the other municipality’s numbers. Looking at the graphs suggests that a larger denominator was used in the article, and indeed the numbers match up perfectly if I were to divide the difference of population in the 25-39 year old cohort by the 2016 number instead of the 1996 number. An embarrassing data mistake to be sure, but nothing out of the ordinary for today’s news stories.

I can’t explain why Surrey, the city with the largest gain in millennials, was dropped from the data used for the story.

Data Representation

But these data problems are really only a side show to the real issue. The most important question is what data to use for what purpose. The article chose to use the change in the total number of millennials to support the notion that the 25 to 39 year old cohort are shunning the City of Vancouver for some of the more outlying regions. The obvious issue with that is that that measure is confounded by population growth. If the population is growing, so will the number of millennials, even if the share of millennials in the population did not change. For this story, this is clearly a very poor choice of data representation.

As a first approximation to understanding where that age cohort settles in 2016 compared to 1996 one can look at the respective shares of the population in those age cohorts. The only problem, the pretext of the story goes away when one represents the data in this way.

What stands out is that the share of 25-39 year olds dropped in all areas. Some of that is just part of the changing makeup of the population in general. And one sees that the City of Vancouver not only has the highest share of 25-39 year olds, it also experienced the lowest drop.

One should probably also look at other age cohorts to better understand how the population is changing. And compare this to other regions to try and distinguis Vancouver-specific trends from more general Canada-wide ones.

Framing of the Data

The other part of the story that irks me is the deep confusion and free mixing of two different concepts. One is that of migration, that is (the same) people moving from one area to another over some time period. The other is that of the number or share of (different) people in a secific age cohort at two distinct points in time. Nathan Lauster has added some very good analysis to this topic, and has followed up with a series of blog posts. And this was picked up in various news articles too.

This article not only lacks appreciation for this important distinction by talking about “migration” when really comparing age cohorts, but it takes it to the next level by talking about “millennials” as being 25-39 year old in 1996 (as well as in 2016), which is comically absurd.

Not sure what to make of the authors assertion that BC Assessment is in the business of enumerating 25 to 39 year olds between 1996 and 2016, I wonder how people get stuff like this past their editors.

The larger storyline is still important here, as Vancouver grows up from a city with surrounding suburbs into an integrated metropolitan area. And a new generation, spurred on by new challenges, including housing affordability, accelerating that transformation and re-defining what some of these former suburbs into hip local centres that are tied together by a growing transit system.


| Comments

Ever since that Bloomberg article whose claims nobody could reproduce, where the author refused to disclose what data was used, but that got recycled all across the local press there has been a hightened interest in migration patterns in Vancouver. Nathan Lauster took it upon himself to dig deeper and look if Vancouver’s lifeblood was really leaving, which he kept elaborating on as better data became available until the most recent iteration that compares Metro Vancouver to other Candian metropolitan areas as well as the City of Vancouver to other cities within Metro Vancouver using 2016 census data.

This is seriously good work and we thought it would be helpful to reproduce Lauster’s methods in CensusMapper. The result is a series of maps, one for each five-year age cohort, that visualized net migration of the cohort geographically, while hovering over a region reproduces Lauster’s net migration bar graph for that region.

Surprise Maps

| Comments

At CensusMapper we like building models based on census data. We now have a common tiling for 2011 and 2016 geographies that allows us to easily model changes over time. After building a model we often want to see how well the model performs. An easy way to do this is to simply map the difference of observations and model predictions.

Those maps are great and it is easy to understand what is mapped. But they are difficult to interpret properly. In many cases a better metric to map is how consistent the observations in each region are with the model. Which brings us to Bayesian surprise maps.

Marine Gateway and Joyce-Collingwood

| Comments

There has been some recent confusion that got confounded further about transit-oriented development in Vancouver harbouring a large number of non-primary residence homes. Good data is important in moving forward in Vancouver’s crazy housing market. Without proper context, a couple of data points can serve to paint a very misleading picture of what is happening. So I decided to fill in some gaps on the very narrow question of understanding the CT level numbers that get tossed around. No deep analysis, just looking into the CTs in question to see where the numbers that the census picked up came from.


To understand the overall rate of 24.4% non-primary residence dwelling units at the Joyce census tract, one should split the area into the Wall Centre Central Park development (99.2% non primary residence units) and the rest of the CT (3.4% non-primary residence units).

To understand the Marine Gateway CT (24% non-primary residence dwellings), it should be split inte the block with Marine Gateway development (13.7%), the block containing the MC2 development (67.4%), and the rest (10.1%).

Comparing any of these very recent developments to the much older Coal Harbour makes no sense. Coal Harbor is still “filling in” although at a stubbornly slow rate. It will be interesting to see if the new vacancy tax can help speed that up.

Comparing Censuses

| Comments

It’s great to have fresh census data to play with. Right now we only have three variables, population, dwellings and households. There is still lots of interesting information that can be extracted.

So we started exploring in our last post, things get really interesting when looking at change between censuses. But as we noted, there are several technical difficulties that need to be overcome.

So we at CensusMapper took that as and invitation to do what we love most: breaking down barriers.

RS Population Change

| Comments

With reporting on the new census numbers gaining traction, and now Mayor Robertson picking up on single family neighbourhoods losing population we thought it is time to crunch some numbers.

Why does it need number crunching? All the reporting so far is based on looking at CT (Census Tract) aggregates, like e.g. in the map shown and linked to the right. But there is actually no single CT in the City of Vancouver that only contains RS zoning. Deducing results by just looking on CT aggregates can lead to misleading reporting, like we have seen with unoccupied dwellings in the “Marine Gateway Neighbourhood”. Given how prominent this topic has become it is high time to dig into the details.


In summary, we can confirm that RS (single family), RT (duplex) and FSD neighbourhoods have been dropping population. Slightly. Looking separately at the east and the west side, we notice that population in these neighbourhoods dropped by about 1% on the west side and increased slightly on the east side.

In all groupings that we looked at the household size dropped and the rate of unoccupied dwellings increased. This was counter-acted by a growth in dwelling units, mostly confined to RS zones where laneway houses and suites were added (or newly discovered in the 2016 census).

We split the analysis into core regions, blocks that lie completely within RS, RT and FSD zoning, and fringe regions, blocks that have RS, RT or FSD zoning as well as other zoning. Fringe regions grew in population and had overall lower rates of unoccupied units when compared to core regions.

Transit Explorer

| Comments

I have played with Mapzen’s Isochrone serivce in the past with a simple visualization of walksheds.

Recently Mazen updated the isochrone API to allow for a more fine-grained selection of exactly what transit services to include or exclude in transit routing, and they created an amazing mobility explorer based on that.

Partially motivated by chatting with two TransLink planners I decided to riff off of that and take a look at how well TransLink serves different parts of Vancouver. At different times of day. And how susceptible TransLink’s network is to Skytrain service disruptions.

More on Teardowns

| Comments

A little over a year ago we ran some analysis on teardowns of single family homes in the City of Vancouver. We used the City of Vancouver open data to understand why some single family homes got torn down and other’s don’t.

Relying entirely on open data, there were some important questions that could not be answered. So together with Joe Dahmen at UBC’s School Of Architecture And Landscape Architecture we came back to the question and folded in transaction data from BC Assessment to add some more details and rigor.

The result turned out quite similar to what our initial cruder methods came up with, but it lead to some important refinements.

We won’t go into the details of the findings here, you can read the online data story if you are interested. Instead we will go into a little more details how the analysis was done and what is still missing.

2016 Census Data - Part 1

| Comments

Finally the first batch of 2016 census data has arrived on Tuesday AM and CensusMapper was updated with the new census numbers by mid-morning.

Dissemination Block data was a little harder to find, but with the help of some friendly StatCan people I finally managed to locate the data and add that too this afternoon.

Time for writing up some observations. I am hoping to find time to do this regularly as more data gets released.

Jane Jacobs’ Vancouver

| Comments

Some time ago I saw Geoff Boeing’s excellent package to generate Jane Jacobs style street grid images. It’s lots of fun to compare different cities that way.

It can be hard to represent one city by one square mile, so I thought it would be neat to use this to compare different parts of Vancouver. Some common themes emerge for the central parts, the more outlying areas display very differnet patterns.