Suppose that \(1/10\) of \(1\%\) (i.e., \(0.1\%\)) of a population of people have the Bucolic Plague. Tests are made to determine who does, and who does not, have the disease. The test is \(99\%\) accurate. You get your results: positive. What is the probability that you actually have the disease?

Suppose that \(1/10\) of \(1\%\) (i.e., \(0.1\%\)) of a population of people have the Bucolic Plague. Tests are made to determine who does, and who does not, have the disease. The test is \(99\%\) accurate. You get your results: positive. What is the probability that you actually have the disease?

Lets say that there are \(1,000,000\text{ people}\) in the region where the plague exists and that \(0.1\%\) of these (\(1000\text{ people}\)) have the disease, and \(99.9\%\) of them (\(999,000\text{ people}\)) do not.

The test to determine who has the disease is \(99\%\) accurate, which means that \(1\%\) of the time the test will indicate that a person has the disease even though he/she does not, and \(1\%\) of the time it will indicate that a person does not have the disease even though he/she does have it.

If the \(999,000\text{ disease-free people}\) are tested, the results will show that \(1\%\) of them, or \(9990\text{ people}\), have the disease when, in fact, they don't. And of the \(1000\text{ people}\) who do have the disease, the tests will flag only \(990\) of them (\(99\%\text{ of }1000\)) as positive. So we have a total of (\(9990+990\)), or \(10,980\), \(\text{positive results}\).

But we know that of the \(1,000,000\text{ people}\), only \(1000\) of them really do have the disease. So the chance that you actually have the disease, even though you test positive, is \(1000/10,980\), or about \(9\%\).

What this example illustrates is that while a \(99\%\) accurate test appears to be virtually perfect, it can lead to unexpected results when large numbers and skewed distributions are involved. In the above example, if only \(1\) of the \(1,000,000\text{ people}\) has the disease, the test will erroneously indicate that \(10,000\text{ people}\) have it (\(1\%\) of \(1,000,000\)). So if you test positive, there is only a \(1\text{ in }10,000\) chance (\(0.01\%\)) that you actually are infected.

Suppose that in the original example you tested negative. What is the probability that you really don't have the disease.?

Suppose that in the original example you tested negative. What is the probability that you really don't have the disease.?

Of the \(1000\text{ people}\) who have the disease, \(1\%\) of these (\(10\text{ people}\)) will erroneously test negative. And of the \(999,000\text{ people}\) who don't have the disease, \(99\%\) of these (\(989,010\)) will properly test negative. So a total of (\(10+989,010\)), or \(989,020\text{ people}\), will test negative. Since there are \(1000\text{ people}\) actually infected, the chances of being infected when the tests say you are not is \(1000/989,020\) or about \(0.1\%\).

One way to improve the accuracy is to retest those who tested positive. Now, only \(1\%\) of the \(1\%\) that had been misdiagnosed the first time will be misdiagnosed the second time. This has the effect of making the testing \(99.99\%\) correct.

In the original example, of the \(10,980\text{ people}\) who tested positive, \(990\) had the disease, and \(9990\) did not. On retesting, \(99\%\) of the \(990\) (i.e. \(980\text{ people}\)) who had the disease will now again test positive, and \(1\%\) of the \(990\) (i.e. \(100\text{ people}\)) who erroneously tested positive the first time will again test positive, even though they do not have the disease.

Therefore, a total of (\(980+100\)), or \(1080\text{ people}\), will test positive after retesting, of which of which \(990\) actually have the disease. So the chance that you have the disease after testing positive a second time \(990/1080\), or about \(92\%\).