Bots, cheaters and disengaged respondents are the bane of survey analysis, so we conducted a special survey to test out some tricks we’ve picked up. We’re happy to share our expertise in using market research survey design to help you stamp out poor-quality responses and clean up your data.
So, you’ve hit your target survey responses, but is it safe to assume that each gives an honest and accurate picture of a real person’s views? Realistically, there will always be some bad apples contaminating the data. These are likely to come in the form of three categories…
The baffling bots: ‘Click farm’ type fraud is a huge problem for online research because the data that is produced is effectively 100% fake. These bots cheat the system to make money on a large scale. They register as multiple panellists – using spoof IP addresses and profiling information – and their answers are completely automated.
The cheeky cheaters: These are usually real people who only do the survey for the incentive, therefore speed through as quickly as possible to claim the prize. These cheaters may have no relevant experience or knowledge of the topic being investigated, giving rise to responses that are not representative of the target population.
The disappointingly disengaged: These respondents could be exactly who you want, and even begin diligently, but lose interest and grow fatigued as they move through the survey. These disengaged respondents are less likely to read survey questions in detail and therefore unlikely to provide carefully considered and accurate answers.
Poor-quality responses: the problem
The extent to which survey sinners are a problem for online research is difficult to measure. A global survey respondent panel estimates a 5% rate of poor-quality responses from the data they provide, but this is likely to be an underestimate, given that the identification of bad apples relies on researchers flagging these to the provider.
Whatever the proportion of poor-quality responses, the data validity could be called into question, as false and inaccurate responses could produce skewed statistical estimates and significantly damage the research.
Poor-quality responses: spotting the sinners
Luckily, we know a few tricks that help to prevent false responses and weed out survey sinners from the data:
1. Double questions: One method of catching cheaters or bots is to include two questions for which contradictory answers could be provided. For example, you could ask whether they are a parent and then how many children they have, including a ‘none’ option. This works particularly well if the questions are at different points in the survey.
2. Contradictory grid statements: Along similar lines, using a grid with agreement statements that directly contradict one another is a useful method. This would mean agreeing to both statements would be inaccurate or ‘impossible’. This especially helps to clarify when respondents have flatlined (i.e. given the same response for every statement), versus those who just genuinely agree with the statements!
3. Open questions: Open questions are an easy place to spot poor-quality responses – writing a coherent answer can be just too much effort for cheaters, bots and the disengaged. Have a scan of people’s answers for any nonsensical, irrelevant or unusually repetitive (i.e. mass automated) answers. This could include numerical outliers.
4. Attention-checking question: One simple way to catch poor-quality responses is to throw in a question for the sole purpose of tripping up survey sinners. Check if respondents are still paying attention by asking them to select a particular answer – ideally using a similar topic to the rest so it’s not too obvious. Someone didn’t select answer B when you told them to? Get rid of them.
Putting these tricks to the test
We created a survey exploring attitudes towards climate change and peppered it with the four identification tricks above, in order to test whether these were effective in catching poor-quality responses. Using a leading respondent panel, we received 89 responses to the survey.
After carefully combing through and cleaning the data, we found that 20.2% of respondents were flagged as providing poor-quality responses – four times the amount estimated by the panel. This table shows the number of poor-quality responses identified by each method:
|Contradictory grid statements||Open questions||Attention-checking question||Double question|
Contradictory statements were found to be the most effective method, revealing 10 incidents of poor-quality responses, whilst the double question found none in this case. This amounted to 18 individual respondents flagged in total, with several caught out on more than one test.
Interrogating the sinners
Now, we could just get rid of all these respondents and try to replace them, but sometimes the last thing you want to do is throw out responses – especially if it’s been a slog to get them! It’s often worth taking the extra time to scrutinise your data.
For each method, differentiation between the definitely unreliable versus the possibly unreliable is needed. We recommend a flagging system whereby those who fall into the grey area are only removed if they fail multiple criteria.
Other routes to higher-quality data
In addition to the methods we tested, there are other ways you can increase the reliability of your data and prevent drop-outs. These can range from designing your survey to keep respondents engaged, to restricting access to limit the impact of bots:
- Match question wording to scale labels, so even those scanning questions can understand.
- Use consistent language, particularly across rating scales.
- Keep the survey as short and concise as possible.
- Make it conversational – build rapport!
- Make it interactive and visual – the survey should be enjoyable.
- If you are using a potentially untrustworthy sample source, try not to make it too obvious who you want to speak to in the introduction and screening questions.
- Don’t allow multiple attempts at the survey – but do allow people to save and finish their response later if the survey exceeds 15 minutes.
- Remove your ‘back’ button so respondents can’t edit their answers to the ‘right’ one.
With improving technology, there are also more rigorous checks you can undertake. This includes facial recognition, identification checks (e.g. reviewing passports) and recording geolocations. At the moment, these are more often used for qualitative research, but as bots become more sophisticated it’s likely quantitative checks will need to be tighter.
All of these approaches will likely take a bit of tailoring to fit your specific project, and the decision to remove respondents can be subjective, but hopefully these tips will help you make your survey data squeaky clean.