This blog post is the second in a short three- part series on using statistics in the pro-life debate. This week we will continue looking at some common statistical fallacies people make around the abortion debate and how you can avoid making them in a debate, friendly discussion, argument on the internet or some other kind of conversation. Next week there will be a post giving you some tips about what to do instead.
Last week we discussed some of the problems with using small samples and extreme cases in the abortion debate. This week we are going to consider biased samples, false causality and push polls.
Using biased samples
This is a fairly simple fallacy to understand: if you cite a statistic about abortion, you need to be careful that the demographics sampled reflect the population as a whole. For example, when polling people on abortion, it is important to check that their political leanings/gender/ethnicity/religious beliefs (or lack of them)/age etc reflect the population as a whole.
One example of sampling bias is the polling for the 2015 UK general election. The polls under-sampled Conservative voters, which is the reason why they proved badly wrong. It is not unusual for polls to be around 4% out with a sample of around 400 people, but anything more than this is often due to sampling biases.
Almost any poll apart from a census will have some small measure of bias, but beware of polls or studies with high degrees of demographic or other sampling biases. The above statements on biased samples are probably very obvious, but it can be very easy to make these mistakes if you aren’t careful!
A practical way that this can happen is if you only read studies on a specific abortion topic which help support a pro-life view without checking the literature to see if the results hold in other similar studies. In such a case there is a danger that you might have a biased sample of studies, when what you really want to use is the collection of results from all the relevant studies (provided that there are no significant flaws).
False causality is one of the most common statistical fallacies that people make, and so it needs to be discussed in a lot of detail. The first thing you need to understand is correlations. Two quantities are positively correlated if when one quantity increases linearly so does the other, and they are negatively correlated if when one quantity increases linearly the other decreases linearly.1
These data sets all illustrate the concept of correlation. This concept is also strongly related to cubic polynomial regression. Image from here.
An obvious example of correlation in the abortion debate is poverty and abortion rates. It is well known (see here for just one of many examples) that there is a reasonably strong positive correlation between abortion rates and poverty. However, a very common mistake is to claim that because two quantities are correlated that one of them causes the other! This is not always true, since in many cases both quantities may instead be determined by an underlying quantity known as a confounder variable, or perhaps multiple confounders. It may not even be the case that any sort of causal link exists at all!2
A good example of this is the maternal death rate from abortion and the legality of abortion (both in the US). It is very commonly claimed that making abortion illegal will make it very unsafe. While it is true that reported maternal abortion deaths in the US did decrease post Roe V Wade (1973), it is not the case that this was caused by legal abortion. Why? Because if you look at the data since 1940, you can see that abortion related deaths had been declining since then, most likely due to increased access to antibiotics.
When claiming that abortion causes x or is caused by y, you therefore need to make sure that you consider the possibility of false causality first.
Using polls with loaded data
The final fallacy to discuss is the use of polls with data deliberately designed to mislead. Hopefully nobody reading this wants to do this on purpose, although if you do, have you ever considered running for political office?
Joking aside, what we need to discuss is known as push polling. A push poll is one conducted with the purpose of asking loaded questions, typically with the intention of convincing people to vote or think in a certain way. The definitions can vary slightly depending on who you ask, since some users of the term insist that push polls refer only to attempts to trick people into thinking that they are being polled without actually collecting and publishing the results. One example from the US political context was a push poll used by George Bush against John McCain in which voters were asked the following:
“John McCain calls the campaign finance system corrupt, but as chairman of the Senate Commerce Committee, he raises money and travels on the private jets of corporations with legislative proposals before his committee. In view of this, are you much more likely to vote for him, somewhat more likely to vote for him, somewhat more likely to vote against him or much more likely to vote against him?”
A similar issue to push polling is somewhat subtler, but can still have some major implications: changing the phrasing of options available in a poll slightly can alter the results significantly. For example, consider the following three versions of an online poll on voting reform in Canada.3
a.Do you agree that Canada should update its voting method for federal elections to proportional representation?
b.Should Canada eliminate first-past-the-post elections and replace them with proportional representation
c. Should Canada change the method it elects members of parliament from first-past-the-post to proportional representation?
The percentage of votes for yes in each of these polls were 58.3%, 47.1% and 45.8%, even though the question was the same each time! So when citing polls or other data in the abortion debate, check the wording of the question and try to make sure that it’s neutral.
Hopefully the above will have helped you to understand some common statistical errors to avoid. Here is a quick recapitulation of the most important points to take away, from the least serious to the most serious fallacies. Remember, these are not just things to avoid yourself in the pro-life debate, but fallacies you may be able to find in pro-choicers’ use of statistics.
5) Extreme cases can be very misleading if used carelessly.
4) Small samples must be treated with caution and the greater the p-value, the more sceptical you should be.
3) Biased, unrepresentative samples should be treated with caution.
2) Don’t confuse correlation with causation.
1) Polling results can be easily influenced by the wording of a question.
Next week we will look at how to use statistics in the abortion debate effectively.
If there are any questions about anything we’ve discussed or about pro-life issues generally, please leave a comment below and we’ll try to respond quickly.
Dane Rogers is a third year DPhil student in the Department of Statistics based at Merton College, currently working on Chinese Restaurants and Lévy process.
1 It is necessary to specify that the relationship is linear, because there may be other ways in which various quantities can be related. For example, there might be a cubic polynomial, exponential or logarithmic relationship about many others.
2For examples of bizarre correlations, see here.
3Note that online polls are usually very unreliable and influenced by sampling bias. As these polls are being tested against each other, it doesn’t matter for the purposes of this argument since we test the relative differences in polling.