Just downloaded the 2011 ABS (Australian Bureau of Statistics) data pack the other day. I first heard of it from Slashdot, where they mentioned it was a pain in the ass to download the data directly. The alternative is to fork out $200 to get a DVD delivered!! Fortunately, someone was being a true aussie and packaged it all up into a single 4.9GB torrent file. When decompressed it expands to a whopping 22 GB of CSV and some sort of map file.
Navigating the CSV files is a bit tricky because they make heavy use of acronyms and id codes that require a separate lookup file. Nonetheless, after 30 min or so I thought I’d compile some simple stats. For fun I made a list of the top 20 Viet suburbs in Victoria, Australia. Why? coz I’m Viet.
|Suburb||2011 count (possible random noise added by ABS)|
|2||St Albans – South||3111|
|5||St Albans – North||2386|
|10||Kings Park (Vic.)||1639|
|11||Deer Park – Derrimut||1575|
|19||West Footscray – Tottenham||824|
I believe the count is based on people born in Vietnam, not sure about Viets born in Australia. The data above tends to correlate with what I’ve observed.
What I found interesting about the data in general is the issue of confidentiality. To protect the data from pin pointing back to individuals they added random noise to the data and even advised against using stats that have small numbers. How small is small? I have no idea. Also of interest is it’s a fairly old Act:
Under the Census and Statistics Act (1905) it is an offence to release any information collected under the Act that is likely to enable identification of any particular individual or organisation. Introduced random error is used to ensure that no data are released which could risk the identification of individuals in the statistics.
Here are some links of interest of this topic:
I’ll probably spend more time playing with the data trying to come up with more racially targeted stats, because they’re cool, interesting and this is Australia 🙂