Abstract
Background/Aim: Although acute cholecystitis (AC) is quite a frequent clinical cause of acute abdominal pain (AAP), the accuracy of a diagnostic score (DS) in confirming AC is rarely considered. The aim of the study was to conduct a detailed analysis comparing the accuracy of common clinical findings, laboratory tests and DS in AC diagnosis. Patients and Methods: A cohort of 1,333 patients presenting with AAP were included in the study. The clinical history and diagnostic symptoms (n= 21), signs (n=14) and laboratory tests (n=3) were recorded in each patient. Results: The significant independent diagnostic predictors (disclosed by multivariate logistic regression model) were used to construct the DS formulas for AC diagnosis. These formulas were tested at five different cut-off levels to establish the most optimal diagnostic performance for clinically confirmed AC. In the ROC comparison test, there was no statistically significant difference in the AUC values of i) clinical history and symptoms (AUC=0.542), and ii) signs & laboratory tests (AUC=0.580), whereas both were significantly inferior (p=0.0001) to the AUC value of the DS (AUC=0.962). Conclusion: In the diagnosis of clinically confirmed AC, the DS formula is superior to clinical symptoms and signs, justifying the use of DS as an integral part of the diagnostic algorithm of AC in all patients presenting with AAP.
Gallstone disease (GSD) is common in the Western population and its prevalence is increasing due to obesity and aging of the population. A third of GSD patients will develop acute cholecystitis (AC) (1, 2). After acute appendicitis, AC is the second most common cause of acute surgical abdomen in the Western countries (1, 2). Over 90% of AC cases result from obstruction of the cystic duct by gallstones. Cystic duct obstruction leads to increased intraluminal pressure inside the gallbladder and triggers an acute inflammatory response.
During the past decades, some attempts have been made to use standardised questionnaires in order to improve the diagnostic accuracy of AC. Chen et al. (3) used gallstones condition-specific questionnaire (CSQ) and suggested that it could help doctors in diagnostic decisions. According to common practice, the diagnosis of AC is performed by history taking, physical examination and ultrasonography (US). In unequivocal or difficult cases, computer tomography (CT) or magnetic resonance cholangio-pancreatography (MRCP) could be performed to confirm the diagnosis (4).
Diagnostic score (DS) systems for AC have been suggested by Japanese Society of Gastroenterology (4) and American Association for the Surgery of Trauma (AAST) (5). The AAST grading system has been proposed for use in research as well as in clinical settings. Despite several DS studies in acute abdominal pain (AAP) (6, 7), the diagnostic performance of DS in the diagnosis of AC among AAP patients has not been previously studied.
In our recent studies, we have analysed the diagnostic accuracy of DS in distinguishing acute appendicitis (AA) from nonspecific abdominal pain (NSAP) as well as the potential gender-specificity of DS in confirming AA (6, 7). Prompted by the frequency of AC among AAP patients and the lack of diagnostic performance studies on DS in AC, we designed the present study to assess the relative accuracy of i) a detailed history taking, ii) clinical examination and laboratory testing, and iii) the DS in detecting clinically confirmed AC among patients with AAP.
Patients and Methods
Criteria for inclusion in this study and diagnostic criteria were those set forth by the Research Committee of the World Organization of Gastroenterology (OMGE) (6-9). Included in the present study were 636 men (47.7%) and 697 women (52.3%) with a mean age (±SD) of 38.0±22.1 years.
The examination of clinical symptoms, signs and relevant laboratory tests were conducted using a standard technique and the results were graded positive or negative as previously described (6, 7, 9) (Tables I and II). The diagnosis of AC was done by considering all symptoms, signs and results of the laboratory tests weighted against the accepted diagnostic criteria of AC (4, 8).
Identifying the DS models. As the first step in constructing the DS, a multivariate logistic (stepwise) regression analysis (SPSS Statistics 26.0.0.1; IBM, NY, USA) was performed to disclose the variables with an independent predictive value. All the variables presented in Tables I and II were included in the analysis as binary data e.g. AC=1 and other diagnosis of AAP=0. Using the coefficients of the regression model, a DS was built and its predictive value for AC was studied. The coefficient of the multivariate analysis shows the relative risk (RR=e_, n=β) of a patient with a given symptom or sign to have AC.
The DS formula for AC. The DS formula for AC (Table III), showing the highest diagnostic performance for AC in hierarchical receiver operating characteristic (HSROC) analysis is as follows: DS=0.89 × location of initial pain (positive endpoint=1, negative endpoint=0) + 0.74 × previous similar pain (positive endpoint=1, negative endpoint=0) + 0.75 × vomiting (positive endpoint=1, negative endpoint=0) – 1.01 × micturition (positive endpoint=1, negative endpoint=0) + 2.02 × jaundice (positive endpoint=1, negative endpoint=0) + 2.77 × tenderness (positive endpoint=1, negative endpoint=0) + 2.48 × Murphy’s sign (positive endpoint=1, negative endpoint=0) + 2.19 × rectal digital tenderness (positive endpoint=1, negative endpoint=0) – 6.13. The mean (SD) of the DS values for AAP (n=1293) was 2.150 (2.30) (Table III).
Statistical analysis. All other statistical analyses were performed using STATA/SE version 16.1 (StataCorp, College Station, TX, USA). Statistical tests presented were two-sided, and p-value <0.05 was considered statistically significant. Using 2×2 tables, we calculated sensitivity (Se) and specificity (Sp) with 95% confidence intervals (95%CI) for each symptom, sign or laboratory test, and created separate forest plots for showing each set of data, separately for each diagnostic variable. We calculated the summary estimates of Se and Sp, positive likelihood ratio (LR+) and negative likelihood ratio (LR-) and diagnostic odds ratio (DOR), using a random effect bivariate model and fitted the summary HSROC curves, including all diagnostic variables in the DS model, using the AC endpoint.
Using the STATA’s prediction tool, we also made posterior predictions [Empirical Bayes (EB) estimates] of the Se and Sp for each diagnostic variable in AC patients, including the different DS cut-offs. Analogous to its use in the meta-analysis, EB estimates give the best estimates of the true Se and Sp for each diagnostic variable, the variable-specific point estimates usually shrinking toward the summary point of the HSROC. We explored the statistical heterogeneity between diagnostic variables and DS models through visual examination of the forest plots and the HSROC curves.
Results
Diagnostic performance of the symptoms. The pooled overall Se of the diagnostic symptoms in confirming AC was 59% (95%CI=45%-73%) (Figure 1). Se exceeded 59% for 10 diagnostic symptoms, and the best five diagnostic symptoms (vertigo, drugs for abdominal pain, use of alcohol, type of pain and micturition) showed 90-98% Se in the diagnosis of AC (Figure 1). The pooled overall Sp of the diagnostic symptoms for detecting AC was 44% (95%CI=28-61%) (Figure 2). Ten diagnostic symptoms showed Sp higher than 44%, whereas the best five diagnostic symptoms in the diagnosis of AC (jaundice, location of pain at diagnosis, bowels, previous abdominal surgery and duration of pain) showed a Sp varying between 73-99% (Figure 2).
Diagnostic performance of the signs and tests. The pooled overall Se of the diagnostic signs and tests for detecting AC was 68% (95%CI=53-81%) (Figure 3), while seven diagnostic signs and tests had Se exceeding 68%. The five most accurate diagnostic signs and tests (urine, rectal digital tenderness, colour, distension and bowel sounds) showed Se in the range of 88-100% (Figure 3). The pooled overall Sp of the signs and tests was only 41% (95%CI=23-60%) (Figure 4), and eight diagnostic signs and tests showed Sp higher than 41%. The five most accurate diagnostic signs and tests (Murphy’s positive, tenderness, abdominal movement, mood and rebound) had Sp of 53-96% (Figure 4).
Diagnostic performance of the DS formulas. The pooled overall Se of the DS formulas for detecting AC was 86% (95%CI=83-88%). The best three DS models (DS I, DS II and DS III) had Se within a narrow range of 86-88% (Figure 5). The pooled overall Sp of the DS formulas for confirming AC was 94% (95%CI=93-95%), with the best three DS models (DS III, DS IV and DS V) reaching a Sp of 95% (Figure 6).
HSROC analyses and empirical Bayes (EB) estimates. STATA (metandiplot algorithm) was used to draw the HSROC curves and EB estimates to visualise the comparison of the pooled overall diagnostic performance of the different symptoms, signs, tests and the DS formulas in the diagnosis of AC (Figures 7, 8 and 9). Based on comparisons of the HSROC AUC values, i) the common clinical findings, as well as ii) signs and tests, were significantly inferior to iii) the AUC values reached by the DS formulas as follows: between Figure 7 (AUC=0.542, 95%CI=0.520-0.562) and Figure 8 (AUC=0.580, 95%CI=0.540-0.621) p=0.611 (ROC comparison test); between Figure 7 (see above) and Figure 9 (AUC=0.962, 95%CI=0.950-0.974), p=0.0001; between Figure 8 and Figure 9, the difference is also highly significant (p=0.0001).
Discussion
Although, gallstones are the most common cause of AC, the clinical picture of AC may vary within a patient population. AC occurs as a result of cystic duct obstruction and gall bladder mucosa damage by mechanical mural irritation by gallstones, which leads to the release of phospholipases from the mucosa cells (1, 2). Phospholipases catalyse the production of lysolecithin, which irritates the gallbladder epithelium and leads to oedema and epithelial vascular insufficiency (1, 2).
The diagnosis of AC is traditionally made on the basis of common clinical findings, supported by signs and lab tests and confirmed with an US. Clinical findings of AC include right upper quadrant pain and tenderness, Murphy’s sign, nausea, vomiting, fever and poor appetite. The differential diagnosis of AC among AAP patients can be difficult and may include several different diseases (6, 7, 9-11). There is no specific laboratory test for the diagnosis of AC, but the high leucocyte count might support the AC diagnosis, albeit the Se and Sp are not particularly high as confirmed in the present series; 54% and 43%, respectively.
Although several different DS systems are available for AAP diagnosis (6, 7), some guidelines suggest DS to improve the diagnosis of AAP (6, 7), and international guidelines cautiously recommend a DS-supported severity grading to improve the clinical management of AC (4, 5). A debate continues on the shortcomings of the specific DS models in sorting out AAP patients. Although AC is a common cause of AAP, the accuracy of DS in the diagnosis of AC has not been critically evaluated. To cast further light on this issue, the present study was designed to conduct a detailed analysis on the relative accuracy of i) the common clinical findings, ii) signs and tests, as compared with iii) the DS, to establish whether the DS could improve the diagnostic accuracy of AC.
Previous studies with a design similar to ours are scanty. Vera et al. (5) included 350 patients with AC in a retrospective cohort study and investigated concordance between the AAST grade and outcome of AC patients. Higher scores of AAST were independently associated with some clinical outcomes in AC patients, however, no significant differences in clinical outcome were shown between the AAST grade 1 and 2 AC patients. Authors concluded that current AAST scoring needs validation with larger patient cohorts (5). Yacoub et al. (12) attempted to identify preoperative predictors of AC patients and to develop a DS for AC patients. They calculated retrospectively a DS for 245 patients based on 5 independent variables. After regression analysis, the five independent variables were; gender (male), leucocyte count (>13 000/mm3), heart rate (>90 bpm), gallbladder wall thickness in US (>4.5 mm) and age (>45 years). Authors concluded that DS could help in the severity grading of AC patients, but they failed to demonstrate any AUC values based on ROC analysis. Ambe et al. (13) investigated a retrospective cohort of patients undergoing laparoscopic cholecystectomy and tried to grade the severity of AC with a DS. According to these authors, a DS has a potential to select AC patients with severe disease, but they concluded that their DS system needs to be validated prospectively (13). Finally, Gouveia et al. (14) evaluated the diagnostic performance of a DS in AC patients with common bile duct stones (CBDS), examining 40 patients with a clinical or US suspicion of CBDS. These data indicated that the American Society for Gastrointestinal Endoscopy (ASGE) score was not useful for diagnosing CBDS patients with AC and they suggested that the ASGE score should not be used in patients with CBDS (14).
Of interest was to compare the diagnostic performance of the symptoms, signs and tests among the AC patients to those of the AA patients, reported in our recent study (7), to see whether the diagnostic accuracy of common clinical findings differs in AA and AC patients. Indeed, this seems to be the case in that the pooled Se of the diagnostic symptoms in detecting AA (80%; 95%CI=67-90%) was substantially higher than that in detecting AC; (59%; 95%CI=45-73%). However, the pooled overall Sp of the diagnostic symptoms in the diagnosis of AA was lower than that in detecting AC; 30% (95%CI=19-42%) and 44% (95%CI=28-61%), respectively. Similarly, the pooled overall Se of the diagnostic signs and tests in the diagnosis of AA was significantly higher than that for AC; 86% (95%CI=79-92%) and 68% (95%CI=53-81%), respectively. As anticipated, however, the pooled overall Sp of the diagnostic signs and tests for AA was lower than that in detecting AC; 34% (95%CI=20-50%) versus 41% (95%CI=23-60%).
When the same comparisons were calculated for the diagnostic accuracy of the DS formulas between AA and AC patients, the trend was similar. Indeed, the pooled Se of the DS formulas was higher in detecting AA than AC: 91% (95%CI=87-95%) and 86% (95%CI=83-88%), respectively. Because Se and Sp behave reciprocally, it was not unexpected to find that the pooled overall Sp of the DS was lower for AA than for AC; 84% (95%CI=75-92%) and 94% (95%CI=93-95%), respectively.
AUC values based on the SROC comparison test showed that diagnostic performance of the clinical signs and tests is slightly better than that of the clinical symptoms, although the difference was not significant. However, the AUC value based on the DS formula is superior to AUC values based on symptoms and signs. A reader might consider that a lack of US is a possible limitation of the present study. However, even with US and inflammatory markers it may be impossible to reach a higher diagnostic accuracy than the 96% AUC (Se/Sp balance) for the DS in AC diagnosis found in this study. Although we could not perform comparisons to previous clinical studies, because the only DS study on AC patients is still unclosed and not analysed (15), the present study is the first to provide data that the DS could be used for the clinical diagnosis of AC among patients presenting with AAP. One of the major advantages of our DS is that this formula does not need US or LC analyses to reach a high diagnostic accuracy in AC.
Conclusion
Taken together, our novel DS formula, constructed by including the significant independent predictors disclosed by a multivariate analysis, reached very high diagnostic accuracy (Se/Sp balance; AUC=0.962) in AC among AAP patients. As compared with the diagnostic performance of the clinical findings, signs and tests (ROC comparison test), the DS proved to be far superior to both these conventional diagnostic tools in the diagnosis of AC in patients with AAP.
Acknowledgements
The study was funded by the Päivikki ja Sakari Sohlberg Foundation.
Footnotes
↵* These Authors contributed equally to this study.
Authors’ Contributions
All Authors have met all of the following four criteria: 1. Substantial contributions to the conception or design of the work or the acquisition, analysis, or interpretation of data for the work, 2. Drafting the work or revising it critically for important intellectual content, 3. Final approval of the version to be published, 4. Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
This article is freely accessible online.
Conflicts of Interest
The Authors have no conflicts of interest or financial ties to disclose. The Authors alone are responsible for the content and writing of this article.
- Received October 25, 2020.
- Revision received November 8, 2020.
- Accepted November 9, 2020.
- Copyright © 2020 International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved.