The method behind the madness
Posted by Louise H on 8 December 2012
…The madness being a visualisation of Open University module results between 2007 and 2011.
For the last 18 months, I have been studying for a degree with the Open University. I’ve successfully completed the equivalent of the first year of full-time study – composed of 120 Level One points, split over modules in both Business and IT. The result of this is that I now have letters after my name – Cert Computing and IT, and Cert Bus Stud. (I especially like the latter). I am now studying two second Level modules in programming. And like many students, I am easily distracted by displacement activities.
Last week, someone posted a breakdown of 2011 results to one of the OU Facebook groups. This document contained percentage breakdowns of results by module codes, and straightaway some numbers caught my eye. For instance, why did only 29.9% of students complete module B190?
Curious, I loaded this pdf document into an excel spreadsheet to muck about with it some more. I quickly realised I needed some more context around this information. What was module B190? What level was it? How many credits did it provide?
Luckily, the OU provide publicly accessible datasets at http://data.open.ac.uk/datasets/, including one for Course Descriptions. I needed to teach myself SPARQL to get at this data – drawing on ancient knowledge of XML and metadata schemas.
I managed to get a list of descriptions and other supporting dimensional data for all of the modules currently offered, which I then popped into a worksheet in the Excel document. (As it happens, B190 is a 5 credit level 1 course called Introduction to Bookkeeping and Accounting. Most level 1 courses are worth 30 points – so this would appear to be some sort of introductory taster course – perhaps most people realise that they don’t really want to be accountants after all?)
I discovered that the OU published these results annually in their Sesame magazine, so I went in search of some historical data to see how 2011’s results compared to previous years. After a lot of scrabbling around I found data for 2007 (direct link to xls spreadsheet), 2008 (direct link to pdf) and 2009. Data for 2010 does not seem to have ever been released. I wonder who we need to contact at OU to get it?
I pulled each of these into the same format & into the Excel spreadsheet. Of course, quite a number of the older spreadsheets contained discontinued courses, so I needed to find supporting reference info from somewhere. Luckily, 2009’s data included names, credits and levels – so I lifted a lot of missing reference data from here. A quick google supplied the remainder of the older module names – however I didn’t spend a huge amount of time on this, so other dimensional data is missing for some historical data. At this point I also derived actual numbers from the supplied percentages, as these are easier to aggregate and manipulate. The supplied percentages were originally rounded to one decimal place, so the actuals may not be completely accurate – particularly for courses with student numbers in the thousands – but they’re not far off.
Having chosen to use Tableau for producing the visualisations, I needed to structure the excel spreadsheets so that the data was unpivoted. I looked at using the Data Wrangler tool, but ended up simply sticking it into a pivot table in order to restructure it – with a simple copy and paste back to a new worksheet. I also put in a three-level hierarchy to structure results at 3 different levels.
It was then time to visualise! The results can be seen over on the other blog…