Abstract
Background/Aim: Noninvasive fecal occult blood tests (FOBTs) are recommended by current guidelines for colorectal cancer (CRC) screening. Our aim was to assess the diagnostic performance of traditional guaiac-based FOBTs (gFOBT) and new-generation immunochemical FOBTs (iFOBT) in CRC screening by carrying out a systematic review and meta-analysis. Patients and Methods: PubMed, Embase, Cochrane Library, and Web of Science were searched for eligible articles published before February 17, 2020. Three independent investigators conducted study assessment and data extraction. Diagnosis-related indicators for use of FOBTs in the detection of CRC (as the endpoint) in a screening setting were summarized, and further stratified by the type of FOBT (gFOBT vs. iFOBT). STATA software was used to conduct the meta-analysis. Pooled sensitivities and specificities were calculated using a random-effects model. Hierarchical summary receiver operating characteristic curves were plotted and area under the curves (AUC) were calculated. Results: The electronic search identified 573 records after duplicates were removed, of which 75 full-text articles were assessed for eligibility. Finally, a total of 31 studies were eligible for the meta-analysis. In the ROC comparison test, there was a statistically significant difference in the performance of gFOBT and iFOBT tests, with AUC=0.77 (95% confidence intervaI=0.75-0.79) and AUC=0.87 (95% confidence intervaI=0.85-0.88), respectively (p=0.0017). In formal meta-regression, test brand did not prove to be a significant study-level covariate that would explain the observed heterogeneity between the studies. Conclusion: New-generation iFOBTs were found to have a significantly higher diagnostic performance as compared with gFOBTs, advocating the use of only fecal immunochemical tests in all newly implemented CRC screening programs.
- Fecal occult blood
- guaiac-based test
- fecal immunochemical test
- gFOBT
- iFOBT
- FIT
- colorectal cancer screening
- meta-analysis
- sensitivity
- specificity
- false negative
- false positive
- ROC
- HSROC
Colorectal cancer (CRC) is the third most common cancer worldwide, with over 1.85 million new cases and over 880,000 deaths occurring in 2018 (1). Population-based screening offers an opportunity for primary prevention and early detection of CRC, with a favorable impact on mortality (2, 3). A wide variety of screening tests are available for CRC, the most widely used being tests for fecal occult blood (FOBT). The use of FOBT was shown to reduce cancer mortality in five large randomized trials (4-8). Several international and national guidelines currently recommend that both women and men at an average risk should undergo organized screening for advanced adenoma and CRC (9).
For detection of FOB, guaiac-based test (gFOBT) and fecal immunochemical test (FIT or iFOBT) are commercially available. The guaiac-based tests utilize the pseudo-peroxidase activity of hemoglobin (Hb; free or intact), whereby guaiac is oxidized by hydrogen peroxidase. Because this reaction takes place with any peroxidase present in stool, gFOBT tests are non-specific to human Hb, with interference by any foodstuffs with peroxidase content, by certain chemicals or even medications (9, 10). Based on a completely different concept, iFOBTs detect the globin moiety of intact human Hb or its degradation products (9, 10). The guaiac-based tests have been available for decades, and their clinical performance has been more extensively studied than that of the FIT tests (9, 10), which were developed in Finland in the late 1980s (11).
Given the continuing debate on the advantages and shortcoming of these two test types, and because of a surprising scarcity of direct head-to-head comparative studies (12, 13), we felt it appropriate to carry out a comprehensive systematic review and meta-analysis covering all eligible studies to compare the diagnostic performance of gFOBTs and iFOBTs in diagnosis of CRC in a screening setting. Because of the reported heterogeneity of the study endpoints regarding CRC precursor lesions (adenomas, polyps), it was only possible to use invasive colorectal carcinoma as the endpoint in this meta-analysis, following the same practice adopted by a previous meta-analysis of iFOBTs some years ago (14).
Patients and Methods
We performed a systematic review and a meta-analysis following the recommendations of the PRISMA statement (15).
Data sources and search process. In order to identify potential studies reporting data on the diagnostic performance of FOBTs for detecting CRC, three independent investigators searched MEDLINE via PubMed, Ovid Embase, the Cochrane Library, and the Web of Science to collect studies published before February 2020 using the following combined search terms: [colorectal (or) colon (or) rectum] (and) [cancer (or) carcinoma (or) malignancy] (and) [faecal immunochemical test (or) fecal immunochemical testing (or) fecal immunochemical test (or) faecal immunochemical testing (or) faecal occult blood test (or) FOBT] (and) [detection (or) screening (or) detecting (or) diagnosis]. We also searched the reference lists of the studies included and relevant published reviews.
An initial search based on the titles and abstracts was conducted to exclude studies that were not relevant to the study topic. In addition, conference abstracts without full texts or studies written in non-English language were also excluded. For potential eligible articles identified in the initial search, a full-text review was performed using the following inclusion criteria: i) reporting of FOBT results along with colonoscopy-biopsy results as the gold standard reference test to confirm the CRC endpoint; ii) specific diagnostic information was provided in detail to enable derivation of true-positive (TP), false-positive (FP), false-negative (FN) and true-negative (TN) numbers, or these diagnosis-related indicators were directly accessed. We excluded all studies that did not meet these inclusion criteria and those where essential information was missing or could not be calculated from the reported data by the investigators.
Data extraction and quality assessment. Assessment of the included studies was performed independently by three investigators during the whole process. When disagreement occurred, consensus was reached through discussion between the investigators. The following information was extracted: Year of publication, country, study setting, population characteristics, diagnostic outcomes, characteristics of the FOBT (type, test brand, and cut-off value), sensitivity, and specificity.
In this review, we only focused on the diagnostic accuracy of FOBT in one single round of testing. For multiple rounds of FOBT tests, only the first-round result was extracted. Sensitivity was defined as the proportion of FOBT-positive patients among those who were diagnosed with the outcome of interest (invasive CRC). Specificity referred to the number of participants with negative FOBT results divided by the number of participants without biopsy-confirmed CRC. For quantitative FOBTs with more than one cut-off value reported in the study, the cut-off values recommended by the manufacturer were used. Because of the heterogeneity of the non-cancer endpoints reported in different studies, only invasive cancer was accepted as the study endpoint in this meta-analysis. Potential risks of bias and applicability of the included studies were assessed according to the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) scoring system, (16) and the detailed protocol is shown in Figure 1. Statistical analysis. All analyses were performed with STATA/SE version 16.1 (StataCorp, College Station, TX, USA). Statistical tests presented were two-sided, and p-values of less than 0.05 was considered statistically significant. Using 2×2 tables, we calculated sensitivity and specificity with 95% confidence intervals (95% CI) for each study, and created separate forest plots for showing each set of data, separately for the gFOBT and iFOBT tests. We calculated the summary estimates of sensitivity and specificity, positive and negative likelihood ratios and diagnostic odds ratio (DOR), using a random-effect bivariate model and fitted the summary hierarchical receiving operating characteristic (HSROC) curves for both gFOBT and iFOBT tests using CRC as the endpoint.
Using STATA's predict tool, we also made posterior predictions (empirical Bayes estimates) of the sensitivity and specificity in each study. Empirical Bayes estimates give the best estimates of the true sensitivity and specificity in each study, the study-specific point estimates usually shrinking toward the summary point of the HSROC. We explored statistical heterogeneity between studies through visual examination of the forest plots and the HSROC curves. Because conventional funnel plots are not recommended to investigate the potential publication bias in meta-analysis of the diagnostic test accuracy studies, this was not done. Instead, to study the potential publication bias, we used Cook's distance (17) here to check for particularly influential studies, together with a scatter plot of the standardised (level 2) residuals to check for distinct outliers. In addition, we also performed meta-regression using restricted maximum likelihood estimation (REML), with different weights to assess whether the test brand (in both test categories) was a significant study-level covariate (i.e. a source of heterogeneity between studies). The Moses–Shapiro–Littenberg method (MetaDiSc software 1.4, Free download, Unit of Clinical Biostatistics team of the Ramón y Cajal Hospital, Madrid, Spain) was used to add the test brand (numerical variable) as a covariate to the model. The anti-logarithmic transformation of the resulting estimated parameters was interpreted as a relative DOR of the corresponding covariate. The relative DOR indicates the change in diagnostic performance of the test under study per unit increase in the covariate.
Results
Literature search result. The electronic search identified 573 records after duplicates were removed, of which 75 full-text articles were assessed for eligibility. Of the 75 articles, 24 articles (26 studies) reporting gFOBT (Table I) and 24 articles (24 studies) using iFOBT (Table II) analysis met the inclusion criteria. Finally, a total of 31 individual studies were included in the meta-analysis. Figure 1 shows the flow chart of the steps in this selection process and lists the reasons for exclusion.
Study characteristics. The 24 articles for 26 studies reporting gFOBT analysis included a total of 99,854 individuals (12, 13, 18-20, 23-25, 27, 29-39, 41, 43, 44, 46), with 886 patients being diagnosed with CRC (Table I). The 24 articles for 24 studies reporting iFOBT analysis included a total of 87,073 individuals (12, 13, 20-30, 32, 33, 36-38, 40-45), of whom 777 had CRC. Fourteen gFOBT studies were conducted in Europe, four in the United States and three in China. Eleven iFOBT studies were performed in Europe, five in China and two in the United States. Sixteen gFOBT studies assessed the diagnostic performance with Hemoccult SENSA (12, 13, 18-20, 23-25, 27, 29, 34, 36, 37, 41, 43, 44), five with Hemoccult II (19, 30-33), one with HemeSelect (19), one with Hemofec (35), one with Hemascreen (39); in two studies the test brand was not available (38, 46). Nine iFOBT studies assessed the diagnostic performance with OC-Sensor/OC-Micro (24, 26, 32, 34, 36, 37, 40, 41, 43), three with Magstream (21, 22, 30), two with FlexSure OBT (20, 27), two with OC-Light (28, 42), two with ColonView-FIT (12, 13), one with Prevent ID (23), one with INSURE (25), one with InstantView (29), one with Hemo Techt NS-Plus (44), one with OC FIT-CHECK (45); in one study the test brand remained unknown (38).
Diagnostic performance of gFOBT. The summaries of the diagnostic performance of gFOBT for all the included studies are shown in the forest plots with the pooled sensitivity (Figure 2) and pooled specificity (Figure 3) for the CRC endpoint. The pooled overall sensitivity and specificity of gFOBT tests for detecting CRC were 0.68 (95% CI=0.57-0.79) and 0.88 (95% CI=0.84-0.91), respectively. In 13 gFOBT studies (11 articles: 12, 13, 19, 20, 24, 31, 36- 38, 43, 46), the sensitivity was higher than 0.68, and the specificity was higher than 0.88 in 14 (13, 19, 27, 29, 31-37, 41, 44). The best six gFOBT studies showed 100% sensitivity (20, 24, 31, 36-38), of which four used Hemoccult SENSA (20, 24, 36, 37), one Hemoccult II (31) and in one study the test brand was not given (38). The best eight gFOBT studies showed 96-99% specificity (12, 19, 29, 31, 32, 33, 36, 44), of which four used Hemoccult SENSA (12, 29, 36, 44) and another four used Hemoccult II (19, 31-33).
Diagnostic performance of iFOBT. The summaries of the diagnostic performance of iFOBT in diagnosis of the CRC endpoint are shown in forest plots with the pooled sensitivity (Figure 4) and pooled specificity (Figure 5). The pooled overall sensitivity and specificity of iFOBT tests were 0.86 (95% CI=0.78-0.93) and 0.85 (95% CI=0.81-0.88), respectively. In 12 iFOBT studies (12, 13, 20, 21, 28, 30, 32, 36-38, 40, 44), the sensitivity was higher than 0.86, and the specificity exceeded 0.85 in 15 (21, 22, 24, 26-29, 32, 34, 36, 37, 40, 42, 43, 45).
The best five iFOBT studies showed 100% sensitivity, all using different test brands: ColonView (12), Magstream (21), OC-Micro (36), OC-Sensor (37) and Hemo Techt NS-Plus (44).
The best seven iFOBT studies showed 93-97% specificity; two studies were performed with OC-Light (28, 42), two with OC-Sensor (37, 43), and one each with Magstream (22), InstantView (29) and OC FIT-CHEK (45).
HSROC analyses and empirical Bayes estimates. STATA (metandiplot algorithm) was used to draw the HSROC curves to enable comparison of the pooled overall diagnostic performance of gFOBT compared with iFOBT (Figures 6 and 7). In HSROC analyses, iFOBT was found to have a statistically significantly greater AUC with 0.87 (95% CI=0.85-0.88) than gFOBT with 0.77 (95% CI=0.75-0.79) (p=0.0017; ROCcomp test).
We also constructed empirical Bayes estimates to compare the overall diagnostic performance of gFOBT compared with iFOBT (Figures 8 and 9). Empirical Bayes estimates are known to give the best estimate of the true sensitivity and specificity of each study, and these estimates were shown to ‘shrink’ towards the summary points as compared with the study-specific estimates (without empirical Bayes estimates, as shown in Figures 6 and 7). This shrinkage was generally greater for sensitivity than for specificity, reflecting both the smaller variance of sensitivity (on the logit scale) and the fact that most studies have fewer individuals with CRC than without (no cancer/adenoma), leading to more precise estimates of specificity than of sensitivity.
Publication bias and quality assessment. Cook's distance is a measure of the influence of a study on the model parameters and can be used to check for particularly influential studies. To check for outliers, standardized predicted random effects can be interpreted as standardized study-level residuals. In Figures 10 and 11, the residual corresponding to the test specificity have been plotted on a reversed axis to correspond with the convention used in the HSROC plots.
In Figures 10 and 11, the two graphs are best read in combination. Cook's distance shows which studies were influential, while the standardized residuals give some insight into why. Figure 10 shows that the iFOBT study by Guittet et al. (ID 11, Magstream) (30) was particularly influential, followed by those of Vasilyev et al. (ID 23, ColonView) (12), Dancourt et al. (ID 10, InstantView) (29) and Hoepffner et al. (ID 4, Prevent ID (23). Studies by Vasilyev et al. (ID 23, ColonView) (12) and Chen et al. (ID 16, test brand not available) (38) had high standardized residuals for specificity, leading to influence on both the mean and variance of logit-transformed sensitivity. The iFOBT study by Dancourt et al. (29) had a large (negative) standardized residual for specificity and also appeared to be influential as judged by its Cook's distance.
In Figure 11 and Table I, the gFOBT study by Chen et al. (ID 19, test brand not available) (38) was found to be particularly influential, followed by those of Wong et al. (ID 5, Hemoccult SENSA) (20) and Hol et al. (ID 12, Hemoccult II) (31).
In meta-regression, no confirmatory evidence to support the role of test brand as an important source of heterogeneity among the iFOBT studies was shown. The test brand used did not prove to be a significant study-level covariate, with relative DOR=1.10 (95% CI=0.94-1.29; p=0.235). The same was true among the studies using gFOBT, with relative DOR=0.92 (95% CI=0.59-1.44; p=0.716).
Discussion
Although there is a general agreement that iFOBTs have better test performance than gFOBTs, there are several iFOBT brands on the market and limited data on the performance of individual test brands makes it difficult to decide which test to choose e.g. in planned CRC screening or in routine diagnosis of FOB (10,12-14). iFOBTs have several advantages as compared with gFOBT in CRC screening, including no need for dietary restrictions, no stool sample instability and smaller number of stool samples needed (10). In addition, a decision analysis revealed that there is no difference in life-years gained when comparing annual iFOBT testing with colonoscopy every 10 years (47).
To provide additional evidence-based information to support the difficult choice between gFOBT and iFOBTs (10), we conducted a systematic review and formal meta-analysis (with meta-regression) covering 26 gFOBT and 24 iFOBT studies all evaluating test performance in CRC screening, using invasive CRC as an endpoint. Similarly to a recent meta-analysis of iFOBT tests by Lee et al. in 2014 (14), the other endpoints (i.e. adenomas, advanced adenomas) were abandoned in this meta-analysis because of a highly variable practice of classifying these cancer precursor lesions thus compromising an unbiased use of these endpoints, important although they are from the point of CRC screening (10, 12-14).
To our knowledge, this is the first systematic review and meta-analysis to evaluate the methodological quality of the included studies, which is essential in order to confirm the strength of the pooled summary results. We carried out an appropriate investigation of the quality of the original studies using the QUADAS-2 quality assessment tool (16). This is justified because the data from the diagnostic performance studies require more complex statistical approaches than needed e.g. for meta-analysis of the studies reporting simple proportions only. To properly account for the correlation between sensitivity and specificity, and obtain unbiased summary estimates of sensitivity and specificity, we used the multilevel statistical methods available in STATA software (48).
In the present meta-analysis, the pooled sensitivity and specificity of gFOBT were 68% and 88% as compared to those of the iFOBTs (86% and 85%, respectively) (Figures 2, 3, 4 and 5). This is the first formal demonstration of the superiority of iFOBTs over gFOBTs, based on rigorous meta-analysis of original studies that have been controlled for their quality by the QUADAS-2 assessment tool (16). The present results are in alignment with the data reported by Lee et al. in their meta-analysis of iFOBTs, albeit not all studies included in the present analysis were yet available in 2014 when their report was published (14). Of note, gFOBTs were not included in their analysis, and this was done for the first time in this study to enable a direct comparison of the two techniques using the pooled summary performance indicators.
It is of major (commercial) interest to assess whether some of the test brands in the two categories (gFOBT and iFOBT) were particularly influential in this meta-analysis, i.e. had a significant influence on the pooled summary estimates in the forest plots. This can be done by different approaches. The simplest way is to make a visual inspection of the forest plots depicting the pooled estimates of sensitivity and specificity, separately for iFOBT and gFOBT tests (Figures 2, 3, 4 and 5). Using this approach, one can easily pick out the studies with the highest sensitivity and specificity. Among both test categories, there were several studies where the test sensitivity was 100%. This high sensitivity is achieved at the expense of lower specificity. Importantly, there was no single study (or test brand), neither among the gFOBTs or iFOBTs, that was 100% specific for the CRC endpoint (Figures 2, 3, 4 and 5). As pointed out before (10, 12, 13), this is exactly what is to be expected because of the simple fact that fecal occult blood detected by these tests is not specific to invasive CRC but can also be derived from various other neoplastic or non-neoplastic sources.
A more formal approach for investigating the studies that are particularly influential is based on calculating Cook's distance, together with the standardized residuals to check for the distinct outliers (Figures 10 and 11). Among the gFOBT studies, particularly influential were the study by Chen et al. (Table I, ID 19, test brand not available) (38), followed by that of Wong et al. (Table I, ID 5, Hemoccult SENSA) (20) and of Hol et al. (Table I, ID 12, Hemoccult II) (31).
Among the iFOBT studies, the study of Guittet et al. (Table II, ID 11, Magstream) (30) was particularly influential, followed by that of Vasilyev et al. (Table II, ID 23, ColonView) (12), Dancourt et al. (Table II, ID 10, InstantView) (29) and Hoepffner et al. (Table II, ID 4, Prevent ID) (23). These studies can also easily be identified from the forest plots by their indicators that deviate from the mainstream. Highlighting a study as influential, however, is not an indication that this was due to the test brand used. Such a statement would be justified only if the test brand is demonstrated to be a significant study-level covariate in meta-regression where the test brand has been included as a covariate.
To cast light on this intriguing issue, we performed REML separately for gFOBTs and iFOBTs, testing different options for the weights (inverse variance weight, study size weight, unweighted) to determine whether the test brand was a significant study-level covariate. Meta-regression did not provide any confirmatory evidence to support the role of test brand as an important source of heterogeneity between the iFOBT and gFOBT studies as shown in the Results section. Thus, it seems clear that the test brand used in the studies was not a significant determinant of the heterogeneity between the studies that was observed in the forest plots (Figures 2, 3, 4 and 5). In practice, this means that none of the test brands included in the original studies in either of the categories is superior to the others. This is not unexpected given that the different test brands with the gFOBT and iFOBT technologies are based on similar technological principles, with no fundamental differences in their clinical performance.
While considering the strengths and weaknesses of the present approach, it is to be admitted that meta-analyses are always subject to the detection and verification biases related to the original studies, since CRC (study endpoint) might be missed at a rate of 0.2-5% even if colonoscopy is used (49, 50). Language bias is also possible, since we omitted the non-English studies; previous reports suggest that this type of language exclusion has only little effect on the pooled summary estimates in meta-analysis (51). There seems to be an interesting seasonal variation in iFOBT test performance, with lower positivity rates in hot weather, due to the degradation of hemoglobin (52). Use of a standard collection device/probe with a known buffer may also be subject to bias. However, at the moment there is no accepted international quality control standard for the use of iFOBTs.
Our meta-analysis has several strengths. Firstly, the meta-analysis included the systematic use of the QUADAS-2 quality assessment tool (16) and followed the recommendations of the PRISMA statement (15). Secondly, our study was based on a comprehensive systematic search of all major global databases, thus minimizing the likelihood of missing any eligible studies.
In conclusion, our systematic review and meta-analysis suggests that the diagnostic capabilities of iFOBTs are superior to those of gFOBTs as a screening tool for CRC. Of interest is the question of which iFOBT test should one choose: Quantitative or qualitative, and which test brand? In quantitative iFOBTs, the cut-off for a positive test for faecal haemoglobin concentration can be adjusted by the end user (Magstream, OC-Sensor/Micro, Ridascreen and OC-Hemodia). In qualitative iFOBTs, the positive test cut-off concentration is pre-set, and the test is read as positive or negative by either visual or automatic reading (ColonView-FIT, InstantView, Prevent ID, OC-Light, FlexSure OBT and Hemeselect). The current data based on a formal meta-regression do not provide definitive confirmation of a superiority of any test brand or even of the impact of the test brand as an important source of heterogeneity between studies. Of the quantitative iFOBTs, the studies using Magstream have shown excellent diagnostic performance. Of the qualitative iFOBT brands, ColonView-FIT, InstantView and Prevent ID seem to be the three topmost choices for CRC screening because of their confirmed excellent test characteristics.
Acknowledgements
The study was funded by the Päivikki ja Sakari Sohlberg Foundation.
Footnotes
Authors' Contributions
All Authors have met all of the following four criteria: i) Substantial contributions to the conception or design of the work or the acquisition, analysis, or interpretation of data for the work. ii) Drafting of the work or revising it critically for important intellectual content. iii) Final approval of the version to be published. iv) Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
This article is freely accessible online.
Conflicts of Interest
The Authors report no conflicts of interest or financial ties to disclose. The Authors alone are responsible for the content and writing of this article.
- Received May 8, 2020.
- Revision received May 23, 2020.
- Accepted May 28, 2020.
- Copyright© 2020, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved