■ site map
BLOG: June 2010 - December 2013
II - Mammography
13. Mammography risks: False positive
mammogram is the positive test result not confirmed during subsequent
evaluation. Specifically, mammogram classified as 0, 3, 4 and 5
according to BIRAD's (Breast Imaging Reporting and Data
System, American College of Radiology) code (1-negative;
2-benign finding. negative: 3-probably benign finding;
As mentioned, false positives of the mammography screening test for breast cancer (BC) are considered more of a risk than false negatives. They involve recall to additional testing, possible biopsy and unnecessary treatment, including radiotherapy, chemo and mastectomy or lumpectomy. It is always accompanied by distress that can last for months.
Unlike false negatives, which mainly affect the woman, false positives put additional burden on the health care system; that may be a part of the reason why it is given more attention to. The fact that about half of the screening-related medical malpractice payments cover claims based on false-negative mammography tests also may explain why false-negatives as a possible mammography harm are in relative obscurity.
In addition, since its false negative rate is very significant, it is not exactly promoting the test, and it is only to expect it would be downplayed in the "nothing-but-good-about-mammography" era.
That said, false positives affect far more women than false negatives: more specifically, for every false negative there is roughly
a hundred false positives.
While the nominal rate is greater for false negatives than positives (~25% vs. ~10%) the difference is that the false-positive rate applies to all screened women, while false-negative rate is limited only to the very small group of women with breast cancer.
False-positive rate and specificity
As shown in Table 1, the rate of false-positives is given as a ratio of false positives vs. all positive tests. However, it is common in the screening mammography context to give the false positives rate (p) as a ratio of false positives (FP) vs. all BC-free women tested, i.e. vs. false positives + true negatives (TN) total. The formula is:
as a ratio number, or %FP=100FP/(FP+TN). If expressed as a ratio, the rate of false positives p is directly related to another indicator of test efficacy, specificity, as s'=TN/(FP+TN) as p=1-s':
FALSE-POSITIVE RATE = 1 - SPECIFICITY
(or p=100-s' for p and s' in %), and
SPECIFICITY = 1 - FALSE-POSITIVE RATE = 1-p
(specificity ratio is denoted by s' for not to be confused with the sensitivity ratio s)
So, while the false-positive rate directly determines specificity (i.e. the odds of the disease-free status correctly confirmed by test), false-negative rate directly determines sensitivity (the odds of having an existing disease detected by a test).
Specificity is a test efficiency indicator frequently used for X-ray mammography screening. Unlike sensitivity, which is limited to the BC-affected portion of screened population, specificity evaluates test's efficiency for those that are disease-free, i.e. it shows how accurate test is in confirming that a person does not have a disease s/he is being tested for.
In screening mammography, specificity is the ratio between the number of BC-free women that tested negative and a total of BC-free women. Or, defining "true negative" as the test result that correctly indicates the absence of disease,
SPECIFICITY = TRUE NEGATIVES VS. (TRUE NEGATIVES + FALSE POSITIVES)
It represents the odds of getting correct test result of not having BC.
Specificity figures for the X-ray mammography vary somewhat with the source. In the Canadian trial, specificity was 94%, implying that about 15 in 16 screened BC-free women had their disease-free status properly confirmed (the rest of 6% of disease-free women had a false-positive test which, divided with the number of all positive tests, gives the rate of false positives).
The latest USPSTF specificity figures for screening mammography are conflicting. The 2009 Recommendation statement cites 94-97% (implying 3-6% false positive rate), based on their 2002 "evidence review" (Humphrey et al. 2002). At the same time, the 2009 Systematic Evidence Review Update (Nelson et al.) directly implies 90-94% (Table 3). It is still too optimistic, since it is common - and it is well known - for mammography specificity to be in the 85-95% range.
The rate of false positives in the U.S. seems to be increasing: the call-back rate per mammogram for women aged 50-69y was 6.5% in a 1998 study (Elmore et al.), and 12.6% in 2003 (Green and Taplin). Also, they nearly doubled in community setting from 1985 to 1993 (Elmore et al. 2002). Part of it is the high rate of malpractice lawsuits - the largest portion of it being for missed BC - which pressures U.S. radiologists to lower the bar for recalls. At present, U.S. recall rate is roughly
double that in most European countries,
while its BC-detection rates remain similar (beside much higher lawsuit rates in the U.S., the reasons for this discrepancy are also more experienced radiologists in Europe, practice of double reading, and set of standards set and monitored by the screening program).
In general, false positive rates are somewhat lower than the average for women over 70y, and higher for the 40-49y age group. The USPSTF's most recent report puts the rate of false positive mammograms at nearly 10% for the 40-49y, and nearly 9% for 50-59y age group (BCSC data; as mentioned, it at the same time implies less than half as high false-positive rate by stating 0.94-0.97 overall mammography specificity, based on the large random controlled trials).
Taking 10% as the probable overall average, one in ten mammograms would be false positive. With annual screening over 10-year period, it would come to 100% rate of false positives; in other words, after ten years of annual screening, there would be
one false positive averaged per every single screened woman.
The actual rate tends to become lower as screening period extends, but current data indicates that still more than half of women screened in a 10-year period would have false positive.
The significance of false positives stems from both, their relatively high frequency and their consequences. The immediate consequence is psychological distress, which can last for months, and negatively affect woman's everyday's life and wellbeing. In addition, most women with positive screening mammogram will undergo additional screening - either another mammography test, thermography, and/or MRI.
Significant portion of women with a false positive result will also undergo unnecessary biopsy. According to the USPSTF 2009 report, biopsy rate for these women ranges from about 1 in 10 for age 40-49y to 1 in 6 for 80-89y (Screening for Breast Cancer: Systematic Evidence Review Update for the U. S. Preventive Services Task Force, Nelson et al. 2009). The average in the 40-69y age group, for which there is more data than for the 70+y group, is about one biopsy for every eight false positives.
These figures, however, may be too optimistic. Despite this USPSTF report implying that the underlying data (Breast Cancer Surveillance Consortium, a collaborative network of five mammography registries and affiliated sites across the United States) "may be" more applicable to the U.S. than data originating from other countries, it also adds that "Rates of additional imaging and rates of biopsies may be underestimated due to incomplete capture of these exams by the BCSC.".
And, indeed, they appear to be underestimated. In a large community-based study in California, the estimated cumulative biopsy rate after 10 mammograms was nearly 19%, vs. estimated cumulative false-positive rate of 49%. That implies nearly 40%, or 2 in 5 biopsy vs. false-positive rate (Elmore et al. 1998, 2400 women 40-69y),
more than three times the biopsies-to-false-positives ratio estimated by the USPSTF.
And, the ratio in Elmore et al. should probably be even higher, considering that it reports a single fine-needle biopsy for every five open or surgical biopsies. In the study period, 1984-1995, fine-needle biopsy was commonly used (it started being replaced by core biopsy in most centers in the mid 1990s), and should have been used roughly as much as the other two combined - if not more.
Also, as the authors note, the overall percentage of false positives in the study was 6.5%, while the national average is "nearly twice as high".
A smaller study that followed the fate of 352 women who received false positive mammogram in the first round of the Stockholm mammography screening trial found that they averaged more than 3 visits to the physician, with 1 in 2 having an additional mammogram, more than 1 in 1 having fine needle aspiration biopsies (397 biopsies), and 1 in 4 having surgical biopsy before being declared BC-free.
Six months after mammogram, only 64% of these women knew that their positive mammogram was an error. In the above mentioned Elmore et al. only 128 biopsies out of 188 (67%) were performed within 1 year from the false-positive test. Knowing that the USPSTF report counts in
only biopsies performed within 60 days from screening
seems to explain, at least in part, the lower biopsy rate estimate by this agency.
A 150 women with a false positive from the second round of screening of the Stockholm trial averaged a very similar rate of visits, additional mammograms, fine needle aspiration and surgical biopsies, as those from the first round (Neglected aspects of false positive findings of mammography in breast cancer screening: analysis of false positive cases from the Stockholm trial, Lidbrinka et al. 1995).
Another large U.S. study of mammography registers in six states (California, Colorado, New Hampshire, New Mexico, Vermont and Washington), with 1985-97 data on 389,533 women aged 30-69y, also does not agree with the USPSTF biopsy rate estimate. Study reports about 1 biopsy for every 8 false positives (for the 40-69y age group), but with fine needle biopsies excluded (Kerlikowske et al. 2000). Assuming about as many fine needle biopsies, the ratio increases to about 1 biopsy for every 4 false positives: somewhat lower than in Elmore et al. but
still about double the USPSTF estimate.
Disagreements between the overall evidence and publicly projected figures on the benefits and harms of screening mammography are rather common. Though the USPSTF works through what is, at least on paper, a panel of independent health care experts, it is evident that its objectivity is compromised by the omnipresent influence of the pro-screening establishment. It is generally biased toward promoting unrealistically high efficacy of mammography screening, but even more so when it comes to downplaying harms of screening, and particularly overtreatment resulting from test's inaccuracy.
Another test efficacy indicator associated with the test's false positive rate is the positive predictive value (PPV), a ratio, or percentage, of confirmed breast cancer diagnoses (true positives, TP, or women with positive test who actually have BC) vs. all positive mammograms (true positives+false positives, TP+FP), i.e.
If c is the breast cancer incidence rate, PPV is related to both, sensitivity (S) and specificity (S') as
where all are given as ratio numbers. Since cS is small relative to 1 and S', good approximation is PPV~cS/(1-S').
It implies that, for given BC rate, the PPV value is directly proportional to test sensitivity, and inversely proportional to (1-S'), i.e. false-positive rate. However, since the two are correlated, as the ROC curves illustrate, with the sensitivity increase by roughly a third from the low to high end of the range followed by a threefold increase in the false positive rate,
PPV value generally decreases with the increase in sensitivity.
Breast cancer incidence rate also has more influence on the PPV value than sensitivity, since it changes threefold, from about 0.25% for the 40-49y group to about 0.8% for women 70y and over. As the ROC curves show, both, sensitivity and incidence rate increase with age, but it is in part offset by the simultaneous increase in false positive rate (i.e. decreasing specificity).
So, in real life, with all three variables changing according to the age group, PPV value increases significantly going from the low to high end of the age range.
USPSTF's estimate of the PPV range in randomized controlled trials is quite loose: 2-22%, or from 1 in 50 to 2 in 9 for mammograms recommended for further evaluation. For those recommended for biopsy, the USPSTF indicates PPV range of 12-78%.
Since the sensitivity of screening varied only from 39% (HIP trial) to 89% (Stockholm trial), and about half as much with the HIP excluded, there had to be a significant differences in the methodology and criteria applied from one study to another in order for the PPV range to vary up to 11-fold. For instance, some trials may have included both, "suspicious" and "inconclusive" mammograms as positive, while others only counted the former; some may have excluded false positives resulting from "technical" errors, some not; and so on.
The same ROC indicate that the corresponding PPV for all women, for biannual screening and 0.6% BC incidence rate, would range from about 7% at the low end of the sensitivity range (~60%) to about 3% at the high end (~80% sensitivity). Regardless of the incidence rate, the PPV value varies not much more than twofold.
Similarly to screening sensitivity, these trial PPV figures are somewhat inflated by neglecting overdiagnosis, i.e. the fact that relatively significant portion of the "true positives" (positive tests confirmed by a diagnosed BC) were not actually a malignant disease.
Analogous to the positive predicting value, the test accuracy indicator showing what portion of all negative tests are true negatives is called the negative predictive value.
The relationship between various accuracy indicators of
screening mammography and its false-negative, false-positive and
BC detection rates is illustrated below using USPSTF's estimates, based on the
Breast Cancer Surveillance Consortium (BCSC) data. A B C D
Test sensitivity is given by a ratio true positives vs. true (diagnosed) BC (multiplied by 100 for %), or as 1-FNR (FNR=false-negative rate, given as false negatives vs. diagnosed BC).
The category "diagnosed BC" creates ambiguity, since it is known that due to the non-specific sensitivity of X-ray mammography and limitations of diagnostic techniques
up to 1 in 3 of diagnosed BC are pseudo-disease,
or overdiagnosis, which wouldn't become symptomatic in woman's lifetime. Hence the sensitivity figure can be defined either as true-positives vs. total of true BC, or as true-positives vs. total of diagnosed BC. The former is correct, the latter is not, but it is accepted as relevant.
Published mammography sensitivity figures routinely ignore overdiagnosis, which lowers the sensitivity figure. The table gives an example of how much a quite conservative 25% overdiagnosis rate, defined as the percent of pseudo vs. actual BC, would lower test sensitivity (violet fields).
Specificity is given as a ratio of true-negative tests vs. total of BC-free women, or as 1-FPR (FPR=false-positive rate, given as false positives vs. all women tested). It is not appreciably affected by overdiagnosis, due to the number of BC cases being negligibly small compared to that of false positives.
It should be noted that the more appropriate ratio for the false-positive rate would be false-positives vs. actual BC. It is only due to it revealing
how poorly screening
mammography alone differentiates
how still poor at it are the post-screening diagnostic techniques, and how high is the risk of getting false positive, that it is not the accepted practice.
Positive predictive value, a ratio of all positive tests vs. diagnosed or real BC, and rate of false negatives are also given with and without 25% overdiagnosis.
From the 40-49y to 80-89y group, respectively, it implies approximately 60-80% sensitivity range (corrected for 25% overdiagnosis), 90-94% specificity, 2-10% positive predictive value, and 34%-18% false negative rate.
In other words, 6 to 8 women with BC, out of ten, do have correct test result, 90 to 94 cancer free women, out of 100, will have correct test result, 1 out of 50 to 1 out of 10 with positive test result will actually have breast cancer,
and 1 in 3 to 1 in 5 women with breast cancer will have
As mentioned, the above USPSTF figures are not necessarily accurate, even as a coarse statistical average. For instance, studies have shown that test sensitivity for women with very dense breast drops into 30-40% range, much lower than the USPSTF's low end. Also, the authors imply that the figures for additional imaging and biopsies "may be" underestimated, and there is no attempt to factor in any estimate of overdiagnosis (diagnosing as breast cancer tissue formations that would have never become cancerous) and resulting overtreatment.
Since the existence of these pseudo-cancers in the "true positives" category directly adds to the number of "screen-detected" cancers, it also implies lower actual sensitivity and positive predictive value (it also slightly increases the rate of false positives, but it has no appreciable effect on test's specificity).
According to research, roughly 2/3 of overdiagnosed breast cancers are diagnosed as invasive, with the rest being carcinomas in situ (CIS). The approximate rate of overdiagnosis for invasive BC alone is 1 in 4 (33%), and 1 in 3 (50%) with CIS cases added (Overdiagnosis in publicly organised mammography screening programmes: systematic review of incidence trends, Jørgensen and Gøtzsche, 2010).
Factoring in this magnitude of overdiagnosis would significantly diminish screening efficacy, as measured by its sensitivity, positive predictive value and false-negative rate among screened women with BC. For illustration, these indicators in the above table are recalculated for the conservative overdiagnosis rate of 25% (it is almost certainly higher - up to twofold - in the main screening group, 50-65y).
Note that presenting false-negatives in the usual manner, only in the total of screened population, is highly misleading. Commonly cited as 1 in 1,000, it appears to be totally negligible. However, in its relevant context, as a rate of false negatives vs. number of all tested women that do have breast cancer, it becomes highly significant: calculated as specified in the table (bottom),
from 26% to 14% without accounting for overdiagnosis, and from 34%
to 18% with 25% overdiagnosis rate,
from 40-49y to 80-89y age group, respectively.
The latter figures, factoring in this overdiagnosis rate, is more realistic, although probably still conservative rate of false negatives. In the total of screened women with the disease for the 40-69y age range the rate of false negatives is, approximately, between 1 in 3 in the early forties and 1 in 5 in late sixties.
Despite being very approximate, these figures are in fairly good agreement with the direct USPSTF estimate figures given in the MAMMOGRAPHY RISKS table.
Next page addresses the long neglected risk of X-ray mammographic screening - overdiagnosis: the possibility that apparent malignant growth diagnosed as breast cancer is not actually threatening women's health.