Bayesian reasoning

Science journalist Tom Siegfried has written an excellent column on the use and abuse of statistics. The essay is somewhat wonkish – inevitable, given the topic – but very readable. It should be easy to understand for anyone who has had a basic course in statistics. Mr Siegfried complains that “statistical tests are widely misunderstood and frequently misinterpreted. As a result, countless conclusions in the scientific literature are erroneous, and tests of medical dangers or treatments are often contradictory and confusing.”

Siegfried covers a lot of ground, but what I enjoyed most was his discussion of Bayesian reasoning, a method of analysis devised by the English clergyman Thomas Bayes and published posthumously in 1763. Standard tests of statistical significance, reported as P=.05 (sometimes written P=0.95), are almost always misinterpreted – even by scientists in peer-reviewed papers – as a 5% probability of error (95% probability of a correct inference). Researchers rarely explain what the finding of a P value of 0.5 really means, possibly because they themselves do not know. The correct meaning is “there is only a 5 percent chance of obtaining the observed (or more extreme) result if no real effect exists (that is, if the no-difference hypothesis is correct)”.

For a simplified example, consider the use of drug tests to detect cheaters in sports. Suppose the test for steroid use among baseball players is 95 percent accurate — that is, it correctly identifies actual steroid users 95 percent of the time, and misidentifies non-users as users 5 percent of the time.

Suppose an anonymous player tests positive. What is the probability that he really is using steroids? Since the test really is accurate 95 percent of the time, the naïve answer would be that probability of guilt is 95 percent. But a Bayesian knows that such a conclusion cannot be drawn from the test alone. You would need to know some additional facts not included in this evidence. In this case, you need to know how many baseball players use steroids to begin with — that would be what a Bayesian would call the prior probability.

Now suppose, based on previous testing, that experts have established that about 5 percent of professional baseball players use steroids. Now suppose you test 400 players. How many would test positive?

• Out of the 400 players, 20 are users (5 percent) and 380 are not users.

• Of the 20 users, 19 (95 percent) would be identified correctly as users.

• Of the 380 nonusers, 19 (5 percent) would incorrectly be indicated as users.

So if you tested 400 players, 38 would test positive. Of those, 19 would be guilty users and 19 would be innocent nonusers. So if any single player’s test is positive, the chances that he really is a user are 50 percent, since an equal number of users and nonusers test positive.

Tom Siegfried, “Odds are, it’s wrong”, Science News, vol. 177 #7, 27 March 2010.

The naive answer (95 percent probability of guilt) is almost surely wrong. This is not due to Bayes. Rather, it is a misinterpretation of standard tests of significance that were developed by mathematician Ronald A. Fisher in the 1920s. What Bayesian reasoning adds – not without controversy – is a “prior probability”, an informed guess about the expected probability of something in advance of the study. In this example, the informed guess is that 5% of professional baseball players use steroids. Suppose, instead, that our informed guess is that 25% of players use steroids. Then, the chances that a player who tests positive really uses steroids is 86 percent. If the no-difference hypothesis applies (we ‘know’ that half the players use steroids and half do not) then the naive answer of 95 percent would be correct. But, if we ‘know’ that more than half the players are on steroids, then the chances that a player who tests positive is really a user exceeds 95 percent. Here are calculations for the case where 75% of players use steroids:

• Out of the 400 players tested, 300 are users and 100 are not users.

• Of the 300 users, 285 (95 percent) would be identified correctly as users.

• Of the 100 nonusers, 5 (5 percent) would incorrectly be indicated as users.

So the odds that the test is accurate for a player who tests positive is is (285/290)= 98 percent, which is greater than the naive answer. This is intuitively plausible, since the vast majority of players are known to be on steroids.

Tom Siegfried is editor of the bi-weekly magazine Science News. His home page, with links to past columns, is here and his recent columns are posted here.

Comments are closed.