Abstract
Background/Aim: The serological biomarker panel (GastroPanel®)(GP) developed by Biohit Oyj (Helsinki, Finland) has gained increasing global acceptance in the diagnosis of atrophic gastritis (AG). This is a systematic review and meta-analysis of the studies on diagnostic accuracy of GP (GPA). Materials and Methods: Core electronic databases were searched until the end of December 2021, following the principles of the PRISMA-P and using the QUADAS-2 quality assessment tool. STATA software with relevant packages (metandi, midas, mylabels) was used for meta-analysis, with AG of the corpus (AGC) as the endpoint. Summary estimates of Se and Sp, LR+ and LR– were calculated using random effect bivariate model (Forest plots), and summary receiver operating characteristic (SROC) curves by hierarchical SROC (HSROC) model. Results: Altogether, 49 studies were found eligible, comprising 22,597 patients examined by the GP test. Significant heterogeneity across the studies was confirmed in Forest plots, HSROC and bivariate boxplot. The pooled Se of GP in diagnosis of AGC was 0.70 (95%CI=0.64-0.76) and pooled Sp was 0.93 (95%CI=0.90-0.95), with AUC=0.900 (95%CI=0.170-1.000) in HSROC. In Fagan’s nomogram, positive GP test predicts AGC at population level with the likelihood of 72%. Meta-regression and subgroup meta-analysis disclosed publication year (<2008>) as the only significant source of heterogeneity, geographic origin of the study being of borderline significance. Conclusion: These meta-analytical results confirm the accuracy of GastroPanel® test in the diagnosis of AGC, advocating its applicability i) in screening for gastric cancer risk conditions (AG, Helicobacter pylori), as well as ii) in non-invasive diagnosis of dyspeptic patients, and iii) in follow-up of AG-patients.
- GastroPanel®
- meta-analysis
- diagnostic accuracy
- atrophic gastritis
- corpus
- random effect
- bivariate model
- hierarchical summary ROC
- Forest plot
- meta regression
- subgroup meta-analysis
- heterogeneity
- publication bias
- effect size
- review
Among the multitude of implicated risk factors of gastric cancer (GC), i) Helicobacter pylori (Hp) infection and ii) atrophic gastritis (AG) are the two most significant ones worldwide (1–5). The intestinal type of GC develops in the atrophic gastric mucosa as a stepwise process known as Correa cascade (3), from Hp-infection, through mild, moderate, and severe AG (5). It is estimated that 5-10% of all Hp-infected patients eventually develop moderate to severe AG, and the risk of GC increases in parallel with the severity of AG; up to 90-fold in patients with AG both in the corpus and in the antrum (3–6).
Correa cascade takes decades to develop into GC, which leaves ample of time to diagnose cancer precursor lesions (AG, intestinal metaplasia, dysplasia), provided that a suitable screening test is applied (5, 7). International guidelines recommend endoscopic surveillance and biopsies for all patients with diagnosed AG (6, 8, 9), and this approach has been the global reference standard for long (3, 10, 11). However, the use of endoscopy as a screening test is expensive and felt uncomfortable by the patients, resulting in unsatisfactory compliance among the screened subjects, who would prefer non-invasive diagnostic tests (12–14). As a result of the development stimulated since the 1980’s by the biomarker studies of Miki et al. (15) and Samloff et al. (16), recent international consensus reports strongly advocate the use of non-invasive serological tests in screening, diagnosis, and surveillance of AG (6, 8, 9). This is also substantiated by a meta-analysis of the prolific literature on serological biomarkers used as stand-alone tests for diagnosis of AG (17).
The major breakthrough in the field of these non-invasive serological tests for AG represents a 4-biomarker panel (GastroPanel®) introduced in the early 2000’s by a Finnish biotechnology company (Biohit Oyj, Helsinki, Finland) (18). This ELISA-test combining serum pepsinogen I (PGI), pepsinogen II (PGII), gastrin-17 (G-17), and Hp IgG antibody, is intended for non-invasive diagnosis of dyspeptic patients and for screening and surveillance of Hp-infection and AG (19). This innovative biomarker panel provides: 1) accurate estimates of gastric acid and G-17 output by the corpus and antrum, respectively, 2) detects Hp-infection, as well as 3) estimates the grade and topography of AG (18, 19).
Since its introduction, GastroPanel® test has been extensively tested in different settings worldwide (19). The literature accumulated until 2016/2017 was covered by two independent meta-analyses, calculating pooled sensitivity (Se) of 72-75% and pooled specificity (Sp) of 95% for GastroPanel® accuracy (GPA) in the diagnosis of AG (20, 21). Since the appearance of these two meta-analyses, the number of studies has more than doubled, indicating a rapidly increasing global interest in GastroPanel® test (19, 22).
Because of the rapidly expanded literature covering also new geographic regions (19, 22, 23), it was felt timely to design a new meta-analysis that covers all studies published until the end of December 2021. As compared to our previous meta-analysis (20), we made two major things differently: First, instead of the classical meta-analytical technique of analysing the pooled Se and Sp separately (20), we calculated the summary estimates using a random effect bivariate model and fitted the summary hierarchical receiving operating characteristic (HSROC) curves (21, 24, 25). Second, because of i) the complexities in the biomarker profile of atrophic antrum gastritis (AGA) (18, 19, 22, 23), ii) the relative rarity of AGA as compared with AGC, and iii) the greater clinical significance of the latter (2–7), the present meta-analysis only covers the studies where AGC was used as the diagnostic endpoint. Studies reporting AGA will be subjected to similar meta-analysis later.
Materials and Methods
This systematic review and meta-analysis strictly follows the principles of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses for Protocols (PRISMA-P) (26). The study was based on collected and synthesized data from published literature, thus exempting an approval by any institutional review board.
Strategy of the literature search. Electronic databases (PubMed, the Cochrane library, Ovid Embase and Scopus) were searched until the end of December 2021. The following keywords from the Medical Subject Headings (MeSH) were selected for searching: “gastropanel”; “panel test”; “test panel”; “pepsinogens”; “pepsinogen I”; “PGI”; “pepsinogen II”; “PGII”; “pepsinogen I/pepsinogen II”; “PGI/PGII”; “gastrin-17”; “atrophic gastritis”; “gastric atrophy”; “intestinal metaplasia”; “gastric cancer precursor”; and “gastric precancer lesion”. Reference Manager Professional Edition (version 12.0.3, Thomson Reuters, MI, USA) was used to build up the reference database including all derived publications (n=8.458).
The time frame of GastroPanel® test development is well known, the first validation study being published in 2002 (27), and there was no need to continue literature search beyond the year 2002. However, we extended the search into the reference list updated on the manufacturer’s (Biohit Oyj) website (18), including relevant abstracts of the key congresses in the field: Digestive Diseases Week, United European Gastroenterology Week, Asia Pacific Digestive Week and International Workshop on Helicobacter and Microbiota.
Initial selection was made on the basis of titles and abstracts. At this stage, also the reference lists of the pertinent reviews were checked to find eventual studies not detected by the systematic literature search. No language restrictions were followed in this selection, and if needed, translations of potentially eligible studies were acquired. Next, all potentially eligible studies (n=748) were subjected to detailed full text assessment to confirm whether the necessary data needed to calculate the GastroPanel® performance indicators were either directly reported or could be extracted from the included results.
Selection criteria. To be eligible, the report had to contain the exact numbers of gastric biopsies analyzed, as well as the numbers of AGC and/or AGA, examined by the complete GastroPanel® test (20, 21). The study was not included, if any of the biomarkers was used as a stand-alone test. If not directly specified, the report had to be complete enough to enable the assessors to calculate these numbers from the reported results.
To be more precise, we included only the studies that met the following PICO (Patient, Intervention, Control, Outcome) criteria: 1) Patients with biopsy-confirmed AGC (and/or AGA); 2) Intervention: GastroPanel® test with clearly defined cut-off values for PGI, PGII, PGI/PGII, G-17 and Hp IgG ELISA; 3) Comparison: biopsy as the reference test, classified according to the Updated Sydney System (USS) (10, 11); 4) Outcome: diagnostic performance indices of GastroPanel® for AGC, including Se, Sp, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (LR+), negative likelihood ratio (LR–), accuracy, or diagnostic odds ratio (DOR), which enable calculating the TP (subjects with positive GastroPanel® test who had biopsy-confirmed AGC), TN (subjects with negative test who did not have AGC in the biopsies), FP (subjects with positive test but no AGC in biopsies), and FN (subjects with negative GastroPanel® test but who had biopsy-confirmed AGC) values. As to the study design, all types were acceptable (20, 21).
Only the studies that met all these inclusion criteria were included (n=49). The exclusion criteria were as follows: 1) narrative review; 2) letter, comment, editorial or reply to questions; 3) study protocol; 4) publication with incomplete data; and 5) systematic review/meta-analysis or consensus report.
Data extraction and primary analysis of the modifiers. A data-record form was used to collect the primary outcomes (TP, FP, FN, and TN) and modifiers in each eligible study. GPA was the primary outcome of this study. The GastroPanel® test result was positive when the test diagnosed the presence of AG either in the corpus (AGC; PGI down, PGI/PGII down, G-17 up) or antrum (AGA; G-17 down, Hp+). The present meta-analysis only reports GPA in the diagnosis of AGC.
To calculate the values, we used the 2×2 tables from the original articles that reported the diagnostic performance indices (Se, Sp, PPV, NPV, LR+, LR–, accuracy, or DOR). We constructed 2×2 tables that contained the number of cases that were TP, TN, FP, and FN. If these 2×2 tables were not directly available in the original report, these were constructed using the numbers found in different sections of the reports. In such a case, we calculated the values for TP, FP, FN, and TN using the following formulas: Se=TP/(TP + FN); Sp=TN/(FP + TN); PPV=TP/(TP + FP); NPV=TN/(FN + TN); LR+=Se/(1-Sp); LR–=(1-Se)/Sp; accuracy=(TP + TN)/(TP + FP + FN + TN); and DOR=(TP x TN)/(FP x FN).
Whenever available, the following items were extracted from each study: year of publication, country of origin, GC risk in the geographic region of the study (low, medium, high), size of the study cohort, cut-off values used for PGs and G-17, the reference standard (gastroscopy with histology), as well as the topographic site of AG (AGA, AGC). In case of multiple articles from a single study (e.g., congress abstract and full report), each study was recorded only once.
Quality control. In the present study, methodological quality control was performed by using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool as detailed by Whiting et al. in 2011 (28). In QUADAS-2, the quality assessment consists of two parts: 1) risk of bias, and 2) applicability concerns (28). The former has four distinct quality items: i) patient selection, ii) index test, iii) reference standard, and iv) flow and timing, i.e., flow of the patients in the study and timing of the index tests and reference standard. The concerns of application include: i) patient selection, ii) index test, and iii) reference standard (28). Each of the risk of bias items are classified as high-, low-, or unclear risk of bias, and the three areas of applicability concerns are graded as high-, low-, or unclear concerns of applicability (28).
Statistical analysis. This meta-analysis was performed using Stata Statistical Software, version 17.0 (College Station, TX, USA), with all relevant packages (metandi, midas, mylabels) installed. The data file included the extracted or calculated TP, FP, FN, and TN values from each study. Forest plots for pooled Se and Sp were created using the bivariate random effects model (mataprop) (24, 25). Summary receiver operating characteristic (SROC) curve was generated using the HSROC model (29) and complemented by empirical Bayes (EB) estimates that closely agree with those of a full Bayesian analysis.
HSROC model also allows controlling for heterogeneity across the studies, as determined by the i) correlation coefficient between logit transformed Se and Sp [Corr(logits)] in the HSROC analysis using bivariate model (24) and ii) asymmetry parameter β. A positive correlation coefficient (>0) and β with p<0.05 indicate heterogeneity between studies (29, 30). On the other hand, β=0 corresponds to a symmetric ROC curve in which the DOR does not vary along the HSROC curve. We also explored heterogeneity across studies through visual examination of the Forest plots and HSROC curve as well as using a bivariate boxplot (by midas).
The sources of heterogeneity were explored using univariate and multivariate meta-regression, with the following study-level covariates: year of publication, country of study, cohort size, geographic GC risk, as well as all items of potential bias or applicability concern recorded in QUADAS-2. We also performed subgroup meta-analyses for these same covariates.
To disclose the influential studies, a likelihood ratio scatter matrix was applied to rate the effect sizes of the individual studies (31). The likelihood ratio matrix defines quadrants of informativeness based on established evidence-based thresholds (LR+/LR–) as follows: 1) left upper quadrant (LR+ >10, LR– <0.1): exclusion & confirmation; 2) right upper quadrant (LR+ >10, LR– >0.1): confirmation only; 3) left lower quadrant (LR+ <10, LR– <0.1): exclusion only, and 4) right lower quadrant (LR+ <10, LR– >0.1): no exclusion or confirmation (31). Using the effect size rating, the studies with their LR+/LR– pairings falling within these 4 quadrants are said to have 1) substantial, 2) moderate, 3) moderate, and 4) minimal effect size, respectively (31).
Because the standard funnel plot and tests are not recommended to investigate the publication bias in meta-analysis of diagnostic test accuracy studies (32), we evaluated publication bias using Deeks’ funnel plot asymmetry test (33). In this approach, the publication bias is assessed by a scatter plot of the inverse of the square root of the effective sample size [1/root (ESS)] versus the diagnostic log odds ratio (lnDOR). This should show a symmetrical funnel shape when no publication bias is present, with p>0.05 (33).
Finally, to assess the clinical or patient-relevant utility of the GastroPanel®, the likelihood ratios were used to calculate post-test probability based on Bayes’ theorem as follows: pretest probability=prevalence of target condition. Post-test probability=LR × pretest probability/[(1-pretest probability) × (1-LR)]. Assuming that the study samples are representative of the entire population, an estimate of the pretest probability of target condition is calculated from the global prevalence of this disorder across the studies. This concept is depicted visually with Fagan’s nomograms (34). We constructed Fagan’s nomogram to give the post-test predictions for AGC at a population level, based on the indicators calculated for the AGC endpoint: i) the pre-test probability; ii) LR+, and iii) LR-.
Results
Identification of eligible studies. Figure 1 depicts the PRISMA flow diagram summarising the process of identifying the studies eligible for this meta-analysis. Using the listed search terms, a large number (n=8,458) of publications were identified in the electronic databases, covering the literature published between 1936 (year when pepsinogen was isolated) and 2021. The vast majority (n=7,710) of these were not relevant for additional inspection, because they were published prior to the launch of the GastroPanel® in 2002. The rest were excluded on the basis of their titles and/or abstracts. Of the remaining 748 studies that were subjected to further scrutiny, 529 were excluded in the first reading. This left a total of 170 studies for the final inspection.
The PRISMA flow diagram of the literature search.
Of these 170 studies with full texts assessed, 135 articles were excluded from the study because of incomplete data, 10 were not eligible because of being reviews, and additional 25 were excluded because the index test (GastroPanel®) was not used. At the end, 49 studies were found eligible for this meta-analysis, all reporting the information necessary to analyse the GPA (GastroPanel® accuracy).
Methodological quality of the eligible studies. Quality assessment of the included studies by the QUADAS-2 tool is summarized in Table I. There was no identified risk of bias in the use of the index test and reference test. More concerns for the risk of bias (high risk or unclear risk) (38.6%) were found in patient selection, which was often inadequately described or clearly included selected groups of high-risk patients. As to the applicability concerns, some studies (8.1%) raised unclear concerns in the proper use of the index test (because of some odd results), and some others (18.3%) about the application of the described patient selection. Most of the applicability concerns, however, were associated with the proper use of the reference test (i.e., interpretation of the biopsy results), being identified in 6.1% with high risk and in 26.5% with unclear risk.
Quality assessment of the eligible studies by the QUADAS-2 tool
Key characteristics of the included studies. As a result of the meticulous literature search (Figure 1), a total of 49 studies were found eligible for this meta-analysis (Table II) (27, 35–82). These 49 studies comprise a total of 22,597 patients examined by the GastroPanel® test. As to their geographic origin, 32 studies were conducted in Europe, Russia being included (27, 35–45, 47–49, 51–55, 57, 58, 60, 62, 65, 66, 69, 71, 72, 75, 78, 80). Altogether, 12 studies were derived from Asia (50, 59, 61, 63, 64, 67, 68, 70, 73, 74, 77, 82), 3 from Latin America (46, 76, 79), one from Africa (56), and one from the USA (81). AGC endpoint was used in all 49 studies (27, 35–82), and in addition, GPA in diagnosis of AGA was presented by 20 studies (27, 37–40, 45, 46, 48, 50, 54, 57, 61–63, 65, 70, 75, 78, 80, 82). In the vast majority of studies, the 25 or 30 μg/l cut-off for PGI was used, following the recommendations of the manufacturer. Similarly, G-17 (marker of AGA) was analysed in 20 studies, almost all using the manufacturer-recommended cut-off of 1 pmol/l (G-17b) or 3 pmol/l (G-17s) (18, 19).
Studies reporting GastroPanel® test accuracy (GPA) in diagnosis of biopsy-confirmed atrophic gastritis of the corpus (AGC) and/or antrum (AGA).
GastroPanel® accuracy (GPA) in the diagnosis of AGC. In the first step, Forest plots for pooled Se and Sp were created using the bivariate random effects model (mataprop). Next, summary receiver operating characteristic (SROC) curve was generated using the HSROC model (midas or metandi) that also calculate the LR+, LR– and DOR. The pooled Se for GastroPanel® in diagnosing AGC was 0.70 (95%CI=0.63-0.76) (Figure 2), and the pooled Sp was 0.93 (95%CI=0.90-0.95) (Figure 3). Values for LR+ (as in all others) were 9.5 (95%CI=6.9-13.2), for LR–: 0.30 (95%CI=0.23-0.39), and for DOR: 31.3 (95%CI=19.9-49.1) (data not shown in figures). The HSROC curve with 95% confidence region and prediction region is illustrated in Figure 4. AUC value was 0.900 (95%CI=0.17-1.000), with the summary operating point at Se 0.70 and Sp 0.93. We also generated empirical Bayes (EB) estimates in the HSROC analysis (Figure 5). EB estimates are supposed to give the best estimates of the true Se and Sp in each study (83).
Forest plot of sensitivity for GastroPanel® in the diagnosis of atrophic gastritis of the corpus.
Forest plot of specificity for GastroPanel® in the diagnosis of atrophic gastritis of the corpus.
HSROC curve with 95% confidence region and prediction region for GastroPanel® in the diagnosis of atrophic gastritis of the corpus.
Empirical Bayes (posterior prediction) estimates hierarchical receiving operating characteristic (HSROC) curve with 95% confidence region and prediction region for GastroPanel® in the diagnosis of atrophic gastritis of the corpus.
To assess the clinical utility of the GastroPanel® test at population level, Fagan’s nomogram was generated (Figure 6). Assuming (the graph default) 20% prevalence of AGC (pre-test probability), Fagan’s nomogram showed that the posterior (post-test) probability of AGC was 72%, if the patients were diagnosed as positive in the GastroPanel® test, and the posterior probability of AGC was only 7% if the patients were diagnosed as negative.
Fagan’s nomogram for GastroPanel® test in the diagnosis of atrophic gastritis of the corpus.
Exploring heterogeneity across the studies. Investigating the possible heterogeneity across the studies was done using different approaches. First, the HSROC curve (Figure 4) was shown to be symmetric, as based on i) the correlation coefficient between logit transformed Se and Sp (HSROC model), which was negative: –0.285 (95%CI=–0.564-0.052), and on ii) the symmetry parameter β: 0.017 (95%CI=–0.335-0.371), which had a non-significant p-value (p=0.921) (data not in tables or graphs). Both these parameters indicate no heterogeneity between the studies (24, 29, 30).
In HSROC curve, the 95% prediction region was wide (Figure 4), and the 95%CI of the AUC=0.900 (0.170-1.000) was extremely wide. In midas, the inter-study variation was 0.71 in Se and 0.78 in Sp, with inconsistency (I2) of 100%. Using the bivariate boxplot with logit_Se and logit_Sp (Figure 7), 10 studies (Study ID: 10, 13, 21, 33, 37, 38, 42, 44, 47, 48) fell outside the circles, which indicates heterogeneity across the studies.
The bivariate boxplot exploring heterogeneity across the studies.
The sources of the observed heterogeneity were further explored using univariate and multivariate meta-regression, with the recorded study-level covariates in the models. In univariate meta-regression, only the publication date (before or after 2008; here called “timing”) was significantly associated with Se (p=0.001), decreasing the Se to 0.41 (95%CI=0.24-0.61), but with no effect on pooled Sp (p=0.530). Another univariate showing a borderline association with Se (p=0.09) was the geographic origin of the study, graded on the basis of its GC risk. The results of the multivariate meta-regression are illustrated in Figure 8. Of all the univariates included in the model, only the publication year (<2008>) was significantly associated with the pooled Se, but not with pooled Sp.
The Forest plots of sensitivity and specificity for the study-level covariates included in the multivariate meta-regression model. GEO: Geographic area of GC risk (low, moderate, high); CutOFF: cut-off value used for PGI and/or PGI/PGII; CohortSize: study sample; Timing: study published before or after year 2008; QUADAS-2 risk of bias in: BiasPS: patient selection; BiasFT: flow and timing; (bias in index test and bias in reference standard excluded from the multivariate model due to no variation); QUADAS-2 Application concerns in: AppConsPS: patient selection; AppConsIT: index test; AppConsRS: reference standard. **significance in the effect size; p<0.01.
To further explore these effects, we performed subgroup meta-analyses for all these covariates (Table III). The AUC seemed to vary across the geographic region of GC risk, from low-risk (LR), intermediate risk (IR), and high risk (HR) with AUC=0.980, AUC=0.910, and AUC=0.860, respectively. The difference between the LR and IR was not significant (p=0.454), but that between LR and HR, as well as IR and HR, was significant: p=0.004; p=0.006, respectively. Cohort size showed a biphasic pattern, with lowest AUC values (0.820; 0.830) being reached in small (n<100) and large (n>400) cohorts, both significantly deviating from the studies with intermediate (100 to 400 patients) cohort size (AUC=0.930). GPA was better for studies using the 25 μg/l cut-off for PGI, as compared to those using the ≥30 mg/l cut-off, with AUC=0.930 and 0.870, respectively (p=0.017). Finally, the year of publication seemed to be a significant source of heterogeneity; as compared with more recent studies, those published before 2008 had higher pooled Se and slightly higher Sp, resulting in a significant difference between the AUC values (p=0.0125).
Subgroup meta-analysis of the studies stratified by the study-level covariates.
Looking for influential studies. To disclose the influential studies, a likelihood ratio scatter matrix was applied to rate the effect sizes of the individual studies (Figure 9). Those studies with their LR+/LR– pairing falling within the LUQ (left upper quadrant) were considered to have a substantial effect on the pooled estimates. Of the 49 studies, only two (Study ID: 10 and 47) were completely within this category, and two others (Study ID: 5 and 40) crossed in part the margins of the LUQ. All studies with only one of their LRs within the areas that indicate high clinical validity, i.e., either LR+ >10 or LR– <0.1, were considered to have a moderate effect.
Likelihood ratio scatter matrix rating the effect sizes of the individual studies. LUO: Left upper quadrant; RUO: right upper quadrant; LLQ: left lower quadrant; RLQ: right lower quadrant; LR+: positive likelihood ratio; LR–: negative likelihood ratio.
Publication bias. Publication bias was evaluated using Deeks’ funnel plot asymmetry test (Figure 10). In the absence of publication bias, the scatter plot should have a symmetrical funnel shape and p>0.05. This was clearly not the case here, because the funnel plot was strongly asymmetric (p=0.0001), which indicates a high probability of publication bias.
Deeks’ funnel plot asymmetry test for assessment of publication bias.
Discussion
Since its introduction in 2002 (27), GastroPanel® has attracted increasing interest among clinicians worldwide (22, 23, 44), leading to a rapidly accumulating literature on its use in different clinical settings. Despite their different meta-analytical techniques used, the first two meta-analyses published in 2016 and 2017 reported almost identical pooled estimates of Se (72-75%) and Sp (95%) for GastroPanel® in the diagnosis of AG (20, 21). In the most recent meta-analyses covering both stand-alone marker studies and GastroPanel® studies, these estimates were clearly inferior: Se 59% and Sp 89% (84). This is in part due to the different cut-off values used for PGI as well as from pooling together AGA and AGC as the GPA assessment endpoint (21, 84). Due to the reasons discussed in detail elsewhere (9, 18, 19, 22, 23), the present meta-analysis was restricted to reporting the AGC endpoint only. Most importantly, i) because AGA and AGC have distinctly different biomarker profiles, and ii) because of the dual regulation of the AGA profile (by high acid output and AGA), these two endpoints must be kept strictly separated in all assessments of GPA (19).
As compared with our first meta-analysis in 2016 that included 27 studies (20), the number of eligible studies has almost doubled by the end of 2021 (Figure 1). This is a clear indication of a significantly increased global interest in the GastroPanel® test, which has also made it eligible for inclusion in the international guidelines and consensus reports (6, 8, 9). With this expansion, the test has been adopted in countries outside the geographic region of its origin (Europe) e.g., in Asia, Middle-East, Africa, Latin America and also the US (Table I), i.e., in geographic regions with significantly different risk of GC. This provides for meta-analysis another potential source of heterogeneity, when studies are stratified by the GC risk of their region (Table II).
This time, we also included quality control (QUADAS-2) in our meta-analysis to assess the methodological quality of the included studies (Table I), which was lacking in our first meta-analysis (20). As usual, a major risk of bias was associated with the patient selection, only 61.4% of the studies being classified in the low-risk category of bias. Another equally frequent item of quality concern was associated with the application of the reference standard, i.e., gastroscopy and biopsies. As repeatedly emphasized (9–11, 18, 19, 22, 23), an accurate classification of the gastroscopic biopsies is of key importance in the assessment of GPA, and any misclassification bias will inevitably compromise this. GastroPanel® test has been optimized for use with the USS classification of gastritis, both having 5 diagnostic categories (10, 11, 19, 22, 23). Mild AGA and AGC are poorly reproducible histopathological diagnoses (10, 11) and these should never be used as the endpoint while calculating the diagnostic accuracy of PGI (PGI/PGII) and G-17, respectively. Instead, only moderate/severe AG (AGC2+, AGA2+) should be used in these calculations (9, 19, 22, 23).
Of the 49 studies included in this meta-analysis, the first was published in 2002 (27), and the latest four studies (79–82) in 2021 (Table II). Starting from the early studies with relatively modest cohort size (below 100, up to 200 patients), the studies published since 2008 generally show an increasing number of patients, i.e., represent screening settings rather than hospital studies. There are two studies with the number of patients exceeding 1,000 (63, 67), with 2,858 and 8,508, respectively. The two studies with the least number of patients are the study of DiMario et al. 2003 (n=13) (36) and the study of Tepes et al. 2018 (n=20) (71).
Visual examination of the Forest plots (Figure 2 and Figure 3) and HSROC curves (Figure 4), shows a substantial heterogeneity across the studies. This heterogeneity is more obvious in the values of Se than of Sp, the former varying within a wide range, from as low as 0.14 (46) and 0.17 (81), to 1.0 in two studies (37, 74). The values of Sp vary within a much narrower range, from the lowermost 0.44 (71) to 1.0, reported in several studies (36, 42, 43, 46, 47, 55, 66). Interestingly, both Se and Sp are generally lower and show wider variation among more recent studies (58–79) as compared with the studies published earlier (27, 35–57). This tendency also seems to have a slight impact in the pooled Se and Sp estimates (0.70 and 0.93), if compared with the data reported 5-6 years earlier by the first two meta-analyses (20, 21), showing Se around 72% and Sp around 95%. Even if obvious by viewing the HSROC curve (with wide 95%CI=0.17-1.0), this heterogeneity could not be demonstrated by the tests of HSROC symmetry. Indeed, both parameters (the correlation coefficient between logit transformed Se and Sp and the symmetry parameter β) (24, 29, 30) indicated that no heterogeneity exists between the studies. This is in contrast to the other test of heterogeneity, the bivariate boxplot (Figure 7), which suggests heterogeneity between the studies. To be on the safe side, the random effect models were used in all meta-analytical calculations.
As to the pooled AUC value of 0.900 obtained in HSROC curve, this is very similar to the HSROC curve shown in the previous meta-analysis from Italy, albeit the exact AUC value was not given (21). For any test, AUC=0.900 must be considered outstanding, and once obtained as a pooled value of 49 separate studies, it is a clear indication that GastroPanel® test is a highly sensitive and specific test in diagnosing AGC. To further refine these Se and Sp estimates of the HSROC curve, we also tested empirical Bayes (EB) estimates by using the “predict eb” command to create another HSROC (Figure 5). It is known that EB estimates are supposed to give the best estimates of the true Se and Sp in each study (83). Comparing Figure 4 and Figure 5, the EB estimates do “shrink” toward the summary point (Se 0.70 & Sp 0.93) of the study-specific estimates (Figure 4), and this “shrinkage” is greater for Se than for Sp. This reflects i) a smaller variance of Se (on the logit scale) and ii) the fact that most of these 49 studies have lower numbers of patients with the index disease (AGC) than without disease (i.e., AGC prevalence <50%), which results in more precise estimates of Sp than of Se (83) (also obvious in Forest plots).
In 1975, Fagan introduced a nomogram to quantify a post-test probability for individuals to be affected by a condition, based on the probability of the condition before the test (pre-test probability) (34). When applied to the present meta-analytical data, Fagan’s nomogram allows us to evaluate the clinical utility of the GastroPanel® test at the population level (Figure 6). Accepting i) the 20% default prevalence of AGC as the pre-test probability, ii) LR+=10 and iii) LR–=0.32 (derived from the HSROC model), Fagan’s nomogram implicates that the GastroPanel® test results of AGC predict this diagnosis in a population, with the likelihood of 72%, whereas the likelihood is only 7% when the test result is negative. Assuming that the study sample is representative of the entire population, an estimate of the pre-test probability reflects the global prevalence of this disorder (34).
The sources of the observed heterogeneity were explored using meta-regression and subgroup meta-analysis (Figure 8 and Table III). In univariate meta-regression, only the publication year (<2008>) was significantly associated with Se but not with Sp. Geographic region graded according to its GC risk showed a borderline (p=0.09) association with Se. The publication year retained its power as the only significant independent study-level covariate in multivariate meta-regression. As already evident by visual inspection of the Forest plots, the earlier studies (before 2008) showed less variation in Se and Sp, resulting (in subgroup meta-analysis) in statistically significantly higher pooled AUC values (AUC=0.960) as compared with those (AUC=0.870) of the studies published after year 2008 (p=0.012). The reasons to this are multiple and fall outside the scope of this discussion.
Another source of heterogeneity seems to be the geographic origin of the study in that the GPA decreases in parallel with the increasing risk of GC in the region (Table III). Again, this seems to be related to the decrease in test Se, while test Sp remains very much unaffected. Given that test Se is dependent on disease prevalence, it can be reasoned that in studies from the high-risk regions, the index lesions (AGC) are more rare, a higher proportion of diseases being dysplasia/intestinal metaplasia/GC rather than pure AGC. When the later (instead of dysplasia or GC) is used as the only endpoint in this meta-analysis, a lower AGC prevalence would explain the lower Se (and PPV) seen in those studies (Table III) (84). This same effect is obvious when screening settings (lower AG prevalence) and clinical settings (gastroscopy referral patients with higher AG prevalence) are compared; Se is lower in the former.
Also, the cut-off value used for PGI seems to be a source of heterogeneity disclosed in subgroup meta-analysis. Given that PGI levels progressively decrease in parallel with the increasing severeity of AGC, it is feasible that the 25 μg/l cut-off is more accurate in finding the AGC2+ endpoint than 30 μg/l cut-off or higher, e.g., 50, 70 or 80 used by some studies (39, 59, 67). Highly unfortunate, the 70 μg/l cut-off was used by the single largest (n=8,508) study of Tu et al. (67; Study ID 34), resulting in very low Se (0.49) and also inferior Sp (0.71), making the study a clear outlier in HSROC curve, and allocating it in the RLQ (study with minimal effect) in the likelihood ratio matrix (Figure 9).
The elegant likelihood ratio scatter matrix analysis (Figure 9) was described only recently (in 2018) (31), and thus not used in the previous meta-analyses (14, 20, 21, 84). Based on the LR+/LR– pairing of each study, the most informative studies are those with both a high positive likelihood ratio (LR+, >10) and a low negative likelihood ratio (LR– <0.1). While a large number of studies had LR+ >10, being allocated in the RUQ in the matrix, only 4 studies also had LR– <0.1, being ranked in the LUQ and classified as substantial in their informativeness (5, 10, 40, 47). Unfortunately, 25 studies were ranked in the RLQ of the matrix, with LR+ <10 and LR– >0.1, classified as studies with minimal impact in the pooled results.
Conclusion
The literature published after 2016/2017 has not substantially changed the pooled estimates of Se/Sp (0.73/0.95) reported by the two previous meta-analyses (20, 21); the updated values being 0.70 (95%CI=0.64-0.76) for pooled Se and 0.93 (95%CI=0.90-0.95) for pooled Sp in the present meta-analysis. HSROC curve reached an AUC value of 0.900, but with wide 95%CI (0.170-1.000). Meta-regression and subgroup meta-analysis disclosed the publication year (<2008>) as the only statistically significant independent study-level covariate as a source of heterogeneity. In addition to heterogeneity, the likelihood of publication bias was high. For the first time, the recently introduced likelihood ratio scatter matrix (31) was applied to rank the effect sizes of the individual studies as substantial, moderate or minimal.
Taken together, these meta-analytical results based on studies published between 2002-2021 are in alignment with the two previous meta-analyses (20, 21) and confirm the accuracy of the GastroPanel® test in the diagnosis of AGC. As also advocated by the international consensus guidelines (6, 8, 9), these data provide important support to the applicability of this test i) in screening for GC risk conditions (AG, Hp) as well as ii) in non-invasive diagnosis of dyspeptic patients, and iii) in the follow-up of the patients with clinically confirmed AG (19).
Footnotes
Conflicts of Interest
The Author has no conflicts of interest in relation to this study.
- Received February 2, 2022.
- Revision received February 22, 2022.
- Accepted February 23, 2022.
- Copyright © 2022 International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved.
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY-NC-ND) 4.0 international license (https://creativecommons.org/licenses/by-nc-nd/4.0).