Abstract
Background: Tumor size is crucial for clinical management and prognosis of breast malignancies. Materials and Methods: The gold standard-size of 12 tumor phantoms was assessed at The Department of Production Engineering. Subsequently, with a conventional ruler, seven experienced mammographers measured the largest diameter of the 12 devices in two independent trials. Results: In the first trial, 30% (n=25) of the 84 values given by the seven mammographers failed to recreate the gold standard size by >1 mm and in the second, by 37% (31/84). Size was overestimated (>1 mm) in 9.5% (n=8) of 84 measurements in the first trial, and in 15.5% (14/84) in the second. Conversely, size was underestimated (>1 mm) in 20% (n=17) of 84 measurements in the first trial, and in 21% (18/84) in the second. Neither the age of the participants, nor their years of experience improved the obtained results. Discussion: The method used here raised doubts concerning the ability of discriminating size among subgroups of T1 breast tumors in mammograms. According to the TNM staging system, T1 tumors (≤2.0 cm in greatest dimension) are subdivided into T1mic: microinvasion (≤0.1 cm), T1a (>0.1 cm but not more than 0.5 cm), T1b (>0.5 cm but not more than 1.0 cm) and T1c (>1.0 cm but not more than 2.0 cm in their greatest dimension). Since the TNM staging system for breast tumors is important in therapeutic decision making, it is crucial to develop a more reliable method for tumor size assessment.
Breast cancer is the most frequent cancer in Sweden, affecting 7,000 females yearly; it accounts for 30% of all female malignancies (1). The preliminary diagnosis is made by anamnesis, inspection, palpation, mammography, ultrasound, magnetic resonance imaging (MRI), and is confirmed by invasive methods such as aspiration cytology, core biopsies or surgical biopsies (2-5).
The worldwide-accepted TNM staging system (6, 7), takes into account tumor size and lymph node and distant metastasis. The size of the primary breast tumor is crucial in planning therapeutic strategies for tumor cure. Tumors measuring no more than 2 cm across (with/without lymph node metastasis) are classified as T1 tumors. T2 tumors are those measuring more than 2 cm, but no more than 5 cm across. When of size more than 5 cm, breast tumors are classified as T3 (6, 7).
To measure for tumor size, several methods have been applied, including physical palpation, mammography, ultrasound, MRI and positron emission mammography (PEM) (2-5, 8-18). In one survey, the tumor size was recorded either by a pathologist on histological sections, by a surgeon on resected material, by a radiologist on x-ray mammography or by a clinician following clinical palpation (2). However, it has been demonstrated that several different diagnostic methods have different accuracies in tumor size assessment. In some subgroups of patients, the over- and underestimation can be even greater than 1 cm (12, 14).
In previous studies on tumor size, 12 tumor phantoms were carefully measured at the Department of Production Engineering, The Royal Institute of Technology, Stockholm, Sweden (19). Once the gold standard was established, 18 senior pathologists and 4 senior surgeons were asked to measure the 12 tumor phantoms. Results showed disparate inter- and intra-observer variations in size assessment in two independent trials (19). In a second test, seven senior colonoscopists were asked to measure the 12 phantoms in tandem colonoscopic examinations performed in a colon phantom (20). Results also showed disparate inter- and intra-observer variations in the size assessment of tumor phantoms in both colonoscopies. In a third test, three senior pathologists (from three different countries), using photocopies, measured the largest actual size of 148 endoscopically-removed colorectal polyps (21). The results again showed disparate inter- and intra-observer variations in size assessment. Even digitalized computed tomography (CT) failed to recreate the gold standard size of phantom images (22).
In the present work, seven experienced mammographers were asked to assess the size of tumor phantoms on mammograms.
Materials and Methods
Tumor phantom devices. Twelve artificial tumor phantoms of different size were created with papier-mâché.
Measuring tumor phantoms at The Royal Institute of Technology. The 12 tumor phantoms were measured at The Royal Institute of Technology with the aid of low-force contacting metrology, at a temperature of 20°C±1°C. Held between the finger tips, each artificial tumor was rotated in a gap between two parallel metal surfaces of a micrometer screw. The distance between the surfaces was reduced until the largest diameter of the tumor phantom caused slight friction when turned around in the gap. A series of measurements was performed in random order of the 12 artificial devices. The micrometer screw (Mitutoyo Digimatic MDC-25MJT, Kawasaki, Kanagawa, Japan) has a certified uncertainty of 0.0016 mm. Only the tumor phantom with the largest diameter was measured with a calliper as its size exceeded the micrometer screw measurement range. The Luna caliper (Luna AB, Alignsås, Sweden) has 0.1 mm uncertainty. The procedure was repeated every second day and after five measurements, the average and standard deviations for each sample was calculated. The size obtained by these measurements was regarded as the gold standard.
Measuring tumor phantoms on mammograms. Tumor phantoms were haphazardly placed on an x-ray plate, each device being coded from #1 to #12 with lead granules. In trial 1, the seven mammographers measured the largest diameter of the tumor phantoms directly on the mammogram, starting from tumor phantom #1 through phantom #12, using a conventional millimeter ruler. In trial 2, the seven mammographers measured once again the largest diameter of the phantom devices two weeks later, starting this time from tumor phantom #6, down to phantom #1, followed by phantom #12 down to phantom #7. Size values by mammographers deviating by >1 mm from the gold standard were regarded as errors in assessing correct tumor size.
Statistical analysis. Each measurement was compared to the absolute value provided by The Royal Institute of Technology (considered the gold standard size) and a percentage value was calculated. The mean of the percentages obtained in the first and the second trials for each different mammographer/tumor phantom pair was calculated. The Pearson's correlation coefficient (r) was also applied to investigate the existence of a possible linear association between the age, and the years of experience as mammographer. Statistical significance was defined as p<0.05.
Results
Out of the seven participants measuring the tumor phantoms, three are females and the remaining four males. The age of the participants ranged from 42 to 69 years (Table I).
Measurements at The Royal Institute of Technology. The result are presented in Table I. The Table shows that the standard deviation for measurements of the largest diameter of the 12 devices was ≤0.05 mm when using the micrometer screw and ≤0.3 mm for the calliper. The difference in size in the 5 measurements was non-significant.
Measurements on mammograms. Trial 1: Table I shows that 29.8% (n=25) of the 84 values given by the seven mammographers failed to reproduce the gold standard measurements exactly.
Trial 2: Results in Table II show that 36.9% (n=31) of the 84 values given by the seven mammographers, failed to reproduce gold standard measurements.
Individual performance in size assessment: The performance of individual participants in size assessment is summarized in Table III. From the Table, it may be deduced that when compared to trial 1, two mammographers improved their performance in trial 2, one had similar success in both trials, whereas the remaining four mammographers gave lower values in trial 2 than in trial 1.
Age and gender of the mammographist and performances in size assessment: As shown in Table III, neither the age nor the gender of the mammographist influenced the performance in assessing correct size of tumor phantoms.
Years of experience in diagnostic mammography and performance of size assessment: Results in Table III show there to be no difference in assessing the correct gold standard size between mammographers with >20 years of experience (range= 21-32 years) and those with ≤16 years' experience (range= 2-16 years).
Discussion
In this survey, mammographers failed to recreate the gold standard size of tumor phantoms by >1 mm in 30% of the measurements in Trial 1, and 37% in Trial 2. Thus, the experience gained with the method in Trial 1 was of no help in improving the performance of the readings in Trial 2, 14 days later.
It may be argued that the errors in assessing the size of tumor phantom by mammographers were related to the use of a conventional ruler. However, in previous work (23), we measured by microscopy, the thickness of the collagenous band in collagenous colitis by the aid of three different methods: a) by histological estimations, b) using a calibrated micrometric ocular scale, and c) by semi-automatic micrometric measuring using a Soft Imaging System (Cell B, Olympus, Tokyo, Japan). The results also showed substantial intra- and inter-observer variations in size evaluation in two independent trials (23). Thus, even when applying more precise methods of size assessment, such as calibrated micrometric ocular scales or semi-automatic micrometric measurements, it was difficult to obtain accurate values when pathologists were confronted with the same histological sections, 14 days apart.
It should be mentioned that mammographer F had 25 years of experience in reading mammograms (Table III). Hence, some mammographers, more than others, under- or overestimated the size of tumor phantoms. Notably, neither the age of the mammographers, nor their years of experience with mammogram readings reduced the errors in recreating the gold standard size.
The present study showed substantial intra- and inter-observer variations in estimating the size of phantom tumors on mammograms. The most plausible explanations for these negative findings might be lack of mental concentration, mental fatigue (due to work-overload), or both.
The results obtained raise doubts concerning the ability to discriminate size among T1 breast tumor subgroups of the TNM classification (6, 7), when applying the traditional method of size assessment in mammograms. According to the TNM staging system for breast tumors, T1 tumors are those measuring 2.0 cm or less in their greatest dimension (3). The TNM staging system recommended sub-dividing T1 breast tumors into: T1mic: microinvasion 0.1 cm or less in greatest dimension, T1a: tumors more than 0.1 but not more than 0.5 cm in greatest dimension, T1b: tumors more than 0.5 cm but not more than 1.0 cm in greatest dimension, and T1c: tumors more than 1.0 cm but not more than 2.0 cm in greatest dimension. Since this staging system is crucial in therapeutic decision making several questions arise: Are the methods used in assessing size of T1 breast tumors with the aid of a millimeter ruler in analog mammograms reliable in discriminating the minute size differences between the T1 tumor subgroups? Should the method of assessing breast tumors in analog mammograms with the aid of a millimeter ruler be abandoned? If the answer to this question is yes, then which alternative methods should be applied for measuring T1 breast tumor subgroups?
One possible alternative method to assess the size of breast tumors in analog mammograms could be that future guidelines concerning the TNM classification of breast tumors should include use of 1:1 translucent templates with the maximum size allotted for T1a, T1b and T1c tumors. The translucent templates could then be placed on suspected tumors appearing on the mammogram to enable standardized measurements of T1a, T1b and T1c breast tumors worldwide.
- Received December 18, 2012.
- Revision received February 5, 2013.
- Accepted February 5, 2013.
- Copyright© 2013 International Institute of Anticancer Research (Dr. John G. Delinassios), All rights reserved