3 lessons COVID-19 has taught us about Big Data and analytics


3 lessons COVID-19 has taught us about Big Data and analytics


Big data and computer models are the future; but just how far can we trust them? The coronavirus pandemic has taught us some useful lessons on this front.


Because COVID-19 is so different from previous pandemics, understanding the data has been far from simple. We have data on when it first appeared in different countries, levels of infection, and mortality rates. However, computer models have differed wildly in their predictions of the risks of the virus and how far and how fast it can spread. There are three main lessons we can learn from this:


  1. There’s no such thing as “The Science”


Politicians are fond of telling us to trust “the science”. But this pandemic has shown that there’s no monolithic “science” that can be followed as if it's the only truth. In the UK, lockdown policies were based on the Imperial College model, which predicted horrific death rates; meanwhile, the Oxford group model, headed by Professor Sunetra Gupta, made much more hopeful predictions, and many others landed somewhere between the two. There simply wasn’t enough data to make an accurate prediction, so rather than going on “The Science”, the government actually had to balance all these different inputs to make its decisions.


  1. We can’t even trust raw data


If you compare, say, the Worldometer website with Johns Hopkins University’s site on COVID-19, you’ll see that not only is the data different, but even the definitions of the data vary. This means we can’t rely on them to make any useful comparisons between countries. Notably, the numbers of people quoted as having the virus are actually the numbers of people who’ve tested positive, which isn’t the same thing at all – what about all those who are sick, but haven’t been tested? These figures tell us more about how assiduously countries are testing for the virus than they do about how widespread it actually is.


  1. Data averages can be misleading


The UK is relaxing its lockdown regulations based on something called ‘R value’ – the Reproduction factor R of the virus. There’s just one problem with that – calculating the average R value for the whole of the UK doesn’t give us a useful picture. There isn’t one large outbreak here – instead, there are many small ones in different areas, each with its own R value based on the individual demographics of that area.


Care homes are a striking example: we know there’s a massive outbreak in care homes, but we can’t assume the same R value across all care homes; in fact, it can be very different. What’s more, the R value can’t even be measured accurately; it can only be modelled, which means we’re basing assumptions on assumptions. 


What all this means is that, while everyone wants to know whether there’ll be a second wave, when it will be safe to travel abroad, and when we can all finally get a haircut, we can’t rely on data to predict any of this with any kind of accuracy. The pandemic has taught us that no matter how sophisticated our analytics get, we can never remove all the uncertainty from life; even in the era of big data, sometimes you just have to wait and see.

Sibling Recruitment 0 comment(s)