Imaging Nerd
All Systems/Statistics & Evidence/Evidence & Stats/Sensitivity, Specificity & Predictive Values

Sensitivity, Specificity & Predictive Values

Key Points
  • Sensitivity and specificity are about the test: how well it catches disease (SnOUT) and how well it clears the healthy (SpIN). They don't change when disease gets more or less common.
  • Predictive values are about the patient in front of you: given this result, what are the odds they truly have (or don't have) the disease.
  • Predictive values swing wildly with how common the disease is. A great test can still throw mostly false alarms in a low-risk crowd.
  • A sensitive test is a wide net (good for ruling out). A specific test is a strict bouncer (good for ruling in).

Every imaging finding you call is really a bet, and these four numbers are how we keep score. The frustrating part is that "how good is this test?" and "what does this result mean for my patient?" are two different questions with two different answers. Mix them up and you'll either reassure someone you shouldn't, or terrify a perfectly healthy person. Let's untangle it.

The 2x2 table: the whole thing in one grid

Picture a security checkpoint. The truth is whether a traveler is actually carrying something they shouldn't. The test is whether the scanner beeps. Cross those two and you get four boxes — the famous 2x2 table.

Disease presentDisease absent
Test positiveTrue positive (TP)False positive (FP)
Test negativeFalse negative (FN)True negative (TN)

Everything below is just this grid sliced different ways. That's the secret: there's only one table, and the four "stats" are four different fractions pulled out of it.

Sensitivity and specificity: judging the test itself

Sensitivity asks: of everyone who truly has the disease, what fraction did the test catch? It reads down the diseased column: TP / (TP + FN). A highly sensitive test rarely lets a sick person slip through, so a negative result is reassuring. The mnemonic is SnOUT — a Sensitive test, when Negative, rules OUT.

Specificity asks the mirror question: of everyone who is truly healthy, what fraction did the test correctly clear? TN / (TN + FP). A highly specific test rarely cries wolf, so a positive result is convincing. That's SpIN — a Specific test, when Positive, rules IN.

Key Point

Sensitivity is a wide net dragged through the water — it catches almost everything, including some seaweed. Specificity is a fussy bouncer who only lets in people with a real invitation. Most tests trade one for the other.

The crucial, almost magical property: sensitivity and specificity are baked into the test. They don't care whether you're in a high-disease cancer clinic or a low-disease screening line. They describe the scanner, not the crowd walking through it.

Figure · diagram
A 2x2 contingency table with TP, FP, FN, TN in the four cells, with arrows showing sensitivity computed down the 'disease present' column and specificity down the 'disease absent' column, and PPV/NPV computed across the 'test positive' and 'test negative' rows.

Predictive values: judging the result in front of you

Here's where it gets practical and where people get burned. When you sign a report saying "positive for pulmonary embolism," the clinician doesn't care about sensitivity in the abstract — they care: given this positive scan, how likely is it real? That's the positive predictive value (PPV): TP / (TP + FP), read across the test-positive row.

The negative predictive value (NPV) is the flip side: given a negative result, how likely is the patient truly disease-free? TN / (TN + FN).

Notice these read across the rows, while sensitivity and specificity read down the columns. Same table, ninety-degree turn.

Prevalence: the plot twist nobody warns you about

Predictive values are not properties of the test. They depend heavily on prevalence — how common the disease is in the population you're testing. This is the single most counterintuitive idea here, so let me make it concrete.

Take a genuinely excellent test — say 99% sensitive and 99% specific — and turn it loose on a rare disease, present in just 1 in 1,000 people. Test 100,000 people: about 100 have the disease and 99 of them test positive. But of the 99,900 healthy people, 1% test falsely positive — that's 999 false alarms. So among everyone flagged positive, fewer than 1 in 10 actually has the disease. The test didn't get worse; the math of rarity did.

Heads Up

This is why a "positive" screening study in a low-risk patient is so often a false alarm, while the exact same finding in a high-risk patient is far more likely to be real. The image looks identical. The patient's pre-test risk is doing half the interpreting for you.

Why this matters at the workstation

This isn't a stats-class abstraction — it changes how you hedge a report.

  • Screening a healthy population (low prevalence) inflates false positives. Expect to chase a lot of benign things. This is the heart of breast cancer screening debates and of the principles of screening generally.
  • Imaging a sick, pre-selected patient (high prevalence — they're in the ED for a reason) makes your positives much more trustworthy and your negatives more reassuring.
Pitfall

The classic trap: quoting a test's sensitivity and specificity from a journal — measured in a population stuffed with sick patients — and assuming the same predictive value holds in your low-risk screening clinic. It won't. The test stats travel; the predictive values stay home and depend on who you're scanning. Watch for this when you read the literature (study design & bias).

Where you set the line

One more knob: most tests force a cutoff somewhere on a continuous scale — a density threshold, a size criterion, an enhancement value. Slide that line toward "call everything positive" and sensitivity climbs while specificity falls; slide it the other way and they trade places. You can't max both at once. That trade-off is exactly what an ROC curve draws out.

If you remember one thing: sensitivity and specificity grade the test; predictive values grade the result for this patient — and prevalence is the dial that connects them. Get those two questions straight and the rest is just rotating the same little 2x2 table.