Data visualisation is a powerful tool that allows us to understand vast datasets better, communicate data analysis, and communicate a hidden story behind our data. It exploits how we perceive information visually to improve our understanding and comprehension.
Data visualisation can be defined as "the representation and presentation of data that exploits our visual perception abilities to amplify cognition."
To help better digest the aim of visualisation, the analogy I use is that data are the items in a supermarket, which - without a shopping list and a recipe - can be overwhelmingly random (have you walked into a supermarket and felt, well... lost?). If raw data is the items on the supermarket shelves, data visualisation is the Sunday roast. All of a sudden, it's so clear the supermarket and all its moving parts fall into place. You can see how the cook selected, discarded, purchased, prepared, and presented the meal, much like a data analyst might select, discard, purchase, design, and present data.
Let's break down the moving parts of visualisation:
What are the specific advantages of visualising data? Let's explore a few classic examples. F.J Anscombe published a journal article in The American Statistician containing his famous quartet, which warns us of the perils of not visualising data.
All four scatter plots share the same line of best fit and correlation; however, visually, the data tell very different stories.
Anscombe's Dataset:
Producing identical summary statistics:
Resulting in very different visual representations:
As you can see, visualising a data set - even a dataset with similar summary statistics - can uncover the truth behind the data set. Raw data doesn't often paint the whole picture.
Another compelling example is Dr John Snow's map of London plotting the location of cholera cases in the city.
At the time, officials believed that diseases such as cholera and the Black Death were spread through miasma - or "bad air". Miasma's theory held that some epidemics were spread through the air from rotting organic matter. However, Dr Snow's map identified a standard water pump used by residents, which was later found to be contaminated, discounting the theory of miasma (although the idea lived on).
With only a few examples, we can see how visualisation fosters better comprehension of raw data. Arthur Conan Doyle stated, "It is a capital mistake to theorise before you have all the evidence. It biases the judgement." Perhaps he should have gone further to add lack-of-visualisation to the capital mistakes.
What other data visualisation examples have surprised you in your career?