Take a sample

Take a sample

3 June, 2020

Hold a packet of Skittles sweets and how many red ones does it contain? You don’t know without looking. But do you want to count them all? Unless you are very pedantic probably not. What you need to do is take a sample.

Put your hand in the bag and grab a few.  Count the red skittles. You now know the number in your hand but how does this relate to the packet? You could take a rough guess but without counting every skittle you don’t know how many a packet contains. Nor do you know if the proportion of red skittles in your hand is consistent throughout the packet. This is an exercise I use in workshops to illustrate how samples are both useful but at the same time have pitfalls to be aware of.

Whilst the number of a particular colour of sweet is trivial question, knowing how many cases of a disease are present in a wider population is quite literally a mater of life and death. As we see everyday governments, the media and others quote figures for the number of Covid-19 cases. Yet despite the certainty with which these figures are references, they shine an important light on the role of samples and the accuracy they provide.

Known Knowns & Unknown Knowns

We can rarely know everything about a big group or population. Most of the time our insights are achieved by viewing a sample - a smaller selection - of that bigger group. And this process is fraught with difficulties when trying to achieve accuracy.

The number of people with Covid-19 is merely the known cases - the known knowns - and this is just a sample of the population of cases. The sample size is related to the number of tests completed and the degree to which those tests capture every case in the wider population - the unknown knowns. As we test more people it is likely that the number of positives will rise, though this is far from guaranteed. No two samples are the sample and difference - or biases - are introduced frequently particularly with human subjects.

What we do known with Covid-19 is that it’s presentation and severity varies between individuals. We know about the asymptomatic cases where no symptoms are visible and are likely picked up by chance. But how many cases have or are still being missed remains a guess. The ones we see are just a sample and finding them is wholly dependent on the sampling frame and strategy we employ.

Absolutely contextual

Yet Covid-19 is becoming a daily example of how data is interpreted for headlines. The manner in which the number of positive cases is communicated does little to highlight the intricacies of how the data has come about. Cases are said to still be rising or, at best remaining, steadily high. Yet this assumes there has been perfect information about the number of cases the day, week or month before. No 2 samples are the same in composition, particularly not in such a dynamic context as a pandemic.

But what do 5,000 people, 8,000 people or more mean? Presentation is focussed on absolute numbers which provide little or no context in which to understand them.

The infection rate we see might better be show as a proportion of tests. At least this will provide something comparison across time.

To provide anything more meaningful is fraught with pitfalls. The problem scientists have with Covid-19 is that we do not and probably never will know the full picture of all the known cases, much like our sweets in a bag. Our assumptions are based on guesstimates from the snippets of information we know and comparison to similar though clearly not the same situations from the past.

Sampling made easy

Cautious Insights

Whether it is red skittles or Covid-19 cases, our understanding of the real total is only as good as the sample we take. Knowing how that sample relates to a wider group or population is the context we need for the results to have any meaning. Of course the larger the sample the better the more accurate the insight though it will never be perfect and whole population surveys are expensive, impractical or both.

Recognising that samples are useful yet far from foolproof empowers you to cautiously gain insights into the number of cases in a wider population  - red sweets in a packet, your customers who like black coffee or the number of people with an illness in a city - whilst understanding that your picture is partial.

It's great to read, even better to share

Thought, inspiration and how-to straight in your inbox - Sign up today

By subscribing you will receive our newsletter up to 4 times a year and occasional news of forthcoming events. You can unsubscribe at anytime.