Pin Vieito N, Zarraquiños S, Cubiella J. High-risk symptoms and quantitative faecal immunochemical test accuracy: Systematic review and meta-analysis. World J Gastroenterol 2019; 25(19): 2383-2401 [PMID: 31148909 DOI: 10.3748/wjg.v25.i19.2383]
Corresponding Author of This Article
Noel Pin Vieito, MD, Staff Physician, Statistician, Department of Gastroenterology, Complexo Hospitalario Universitario de Ourense, C/ Ramón Puga 52-54, Ourense 32005, Spain. email@example.com
Checklist of Responsibilities for the Scientific Editor of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Author contributions: Pin Vieito N and Cubiella J conception and design of the study; Pin Vieito N, Zarraquiños S, and Cubiella J acquisition of data, analysis and interpretation of data, and final approval; Pin Vieito N and Cubiella J drafted the article; Cubiella J contributed to critical revision.
Conflict-of-interest statement: Dr. Pin reports non-financial support from ABBVIE, non-financial support from GILEAD SCIENCES, outside the submitted work; Dr. Zarraquiños reports non-financial support from CASEN RECORDATI, non-financial support from MYLAN, non-financial support from ALLERGAN, non-financial support from OLYMPUS, non-financial support from ABBVIE, outside the submitted work; Dr. Cubiella reports grants from Instituto de Investigación Sanitaria Galicia Sur, grants from Fondo de Investigaciones Sanitarias (FIS), during the conduct of the study; personal fees from NORGINE, personal fees from IMC, outside the submitted work;
PRISMA 2009 Checklist statement: The authors have read the PRISMA 2009 Checklist, and the manuscript was prepared and revised according to the PRISMA 2009 Checklist.
Open-Access: This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Corresponding author: Noel Pin Vieito, MD, Staff Physician, Statistician, Department of Gastroenterology, Complexo Hospitalario Universitario de Ourense, C/ Ramón Puga 52-54, Ourense 32005, Spain. firstname.lastname@example.org
Telephone: +34-988385399 Fax: +34-988385399
Received: February 3, 2019 Peer-review started: February 6, 2019 First decision: March 5, 2019 Revised: March 20, 2019 Accepted: March 29, 2019 Article in press: March 30, 2019 Published online: May 21, 2019
The quantitative faecal immunochemical test for haemoglobin (FIT) has been revealed to be highly accurate for colorectal cancer (CRC) detection not only in a screening setting, but also in the assessment of patients presenting lower bowel symptoms. Therefore, the National Institute for Health and Care Excellence has recommended the adoption of FIT in primary care to guide referral for suspected CRC in low-risk symptomatic patients using a 10 µg Hb/g faeces threshold. Nevertheless, it is unknown whether FIT´s accuracy remains stable throughout the broad spectrum of possible symptoms.
To perform a systematic review and meta-analysis to assess FIT accuracy for CRC detection in different clinical settings.
A systematic literature search was performed using MEDLINE and EMBASE databases from inception to May 2018 to conduct a meta-analysis of prospective studies including symptomatic patients that evaluated the diagnostic accuracy of quantitative FIT for CRC detection. Studies were classified on the basis of brand, threshold of faecal haemoglobin concentration for a positive test result, percentage of reported symptoms (solely symptomatic, mixed cohorts) and CRC prevalence (< 2.5%, ≥ 2.5%) to limit heterogeneity and perform subgroup analysis to assess the influence of clinical spectrum on FIT´s accuracy to detect CRC.
Fifteen cohorts including 13073 patients (CRC prevalence 0.4% to 16.8%) were identified. Pooled estimates of sensitivity for studies using OC-Sensor at 10 µg Hb/g faeces threshold (n = 10400) was 89.6% [95% confidence interval (CI): 82.7% to 94.0%). However, pooled estimates of sensitivity for studies formed solely by symptomatic patients (n = 4035) and mixed cohorts (n = 6365) were 94.1% (95%CI: 90.0% to 96.6%) and 85.5% (95%CI: 76.5% to 91.4%) respectively (P < 0.01), while there were no statistically significant differences between pooled sensitivity of studies with CRC prevalence < 2.5% (84.9%, 95%CI: 73.4% to 92.0%) and ≥ 2.5% (91.7%, 95%CI: 83.3% to 96.1%) (P = 0.25). At the same threshold, OC-Sensor® sensitivity to rule out any significant colonic lesion was 78.6% (95%CI: 75.6% to 81.4%). We found substantial heterogeneity especially when assessing specificity.
The results of this meta-analysis confirm that, regardless of CRC prevalence, quantitative FIT is highly sensitive for CRC detection. However, FIT ability to rule out CRC is higher in studies solely including symptomatic patients.
Core tip: The quantitative faecal immunochemical test for haemoglobin (FIT) has been recommended to guide referral for suspected colorectal cancer (CRC) in people with unexplained symptoms without rectal bleeding. However, the information regarding its accuracy in different settings is scarce. Our meta-analysis reveals that sensitivity for CRC may change across populations with differences in clinical symptoms, irrespective of CRC prevalence. On the other hand, we should not use this to rule out CRC if its prevalence is high. In addition, FIT is not sensitive enough to exclude other significant colonic diseases.
Citation: Pin Vieito N, Zarraquiños S, Cubiella J. High-risk symptoms and quantitative faecal immunochemical test accuracy: Systematic review and meta-analysis. World J Gastroenterol 2019; 25(19): 2383-2401
The quantitative faecal immunochemical test for haemoglobin (hereinafter referred to as ‘FIT’) has been revealed to be highly accurate for colorectal cancer (CRC) detection not only in a screening setting, but also in the assessment of patients presenting lower bowel symptoms[1,2]. Therefore, the National Institute for Health and Care Excellence (NICE) has recently recommended adoption of FIT in primary care to guide referral for suspected CRC in people without rectal bleeding who have unexplained symptoms but do not meet the criteria for a suspected cancer pathway referral. Results should be reported using a threshold of 10 micrograms of haemoglobin per gram of faeces (μg Hb/g faeces)[3,4].
However, a clinical concern has been highlighted on transference of research results to clinical practice. The NICE recommendation applies only to patients who present low-risk symptoms. In contrast, most available studies include patients who had symptoms (e.g., rectal bleeding) associated with higher probability of CRC and most were performed in a secondary care setting. Although other population variables could be involved, this difference in the clinical spectrum could account for the high CRC prevalence shown in the meta-analysis used to support this recommendation (range 2.15% to 5.4%), compared to the estimated 1.5% for the relevant symptomatic group used in NICE guidance ‘NG12’.
Thus, since the prevalence of the target condition may affect estimates of test performance by means of mechanisms other than patient spectrum, there is insufficient information to elucidate whether the presence of high-risk symptoms or another clinical difference involving a higher CRC prevalence in the studies that fitted this meta-analysis inclusion criteria, will affect the expected performance of FIT in primary care. With the aim of assessing the stability of FIT´s accuracy across the broad spectrum of situations we could face outside a screening setting, we decided to perform an additional systematic review expanding upon previous inclusion criteria.
MATERIALS AND METHODS
We designed a systematic review and meta-analysis following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement to conduct and report our systematic review.
Data sources and searches
We included all studies identified by a sensitive search of “FIT for CRC” in MEDLINE (via PubMed) and EMBASE (via Ovid) databases from inception to 21 May 2018. Data sources were also extended to the reference lists of all articles extracted from the search strategy detailed in Appendix 1.
Two authors (NP and SZ) independently reviewed and screened titles and abstracts of articles retrieved and determined final eligibility by means of examination of full texts. Any disagreement was resolved through discussion or by consulting a third author (JC). We regarded studies as suitable for our review if they met all the following inclusion criteria:
Population, setting and study design
We included all prospective cohort studies performed on adult patients out of CRC screening programme setting either including patients: (1) Consulting with a physician for non-acute lower abdominal symptoms; or (2) consecutively scheduled for elective colonoscopy, when at least a fraction of symptomatic patients was included. No language restriction was applied.
Studies that evaluated the diagnostic accuracy of the quantitative FIT for CRC detection either reporting absolute numbers of true-positive, false-negative, true-negative, and false-positive observations, or data from which sensitivity and specificity could be extrapolated. In the case of studies reporting more than one FIT specimen, we only included the results of the first determination.
We included studies that reported an appropriate reference standard (colonoscopy or ≥ 2-year longitudinal follow-up of the controls).
Our main objective was to estimate the diagnostic accuracy of FIT for CRC detection. Secondary goals included assessing the usefulness of FIT to detect advanced neoplasia (AN) and significant colonic lesions (SCLs) in symptomatic patients. The definitions of AN and SCL differ from country to country, which should be considered when interpreting data. This issue will be subsequently outlined in detail for each study.
Data extraction and risk of bias
One reviewer (NP) extracted data and extractions were checked by a second reviewer (JC); any disagreements were resolved by means of discussion and consensus. In each study, potential risks of bias were calculated using the Quality Assessment of Diagnostic Accuracy Studies 2 tool (QUADAS-2). An inverted funnel or “Christmas tree” scatterplot was used to detect publication bias.
Data synthesis and statistical analysis
We classified studies on the basis of brand and threshold of faecal haemoglobin (f-Hb) concentration for a positive test result to limit heterogeneity. When four or more studies on a specific subgroup were available, bivariate analyses were applied to calculate pooled estimates of sensitivity, specificity and likelihood ratios using the statistical software package STATA (v14)[9,10]. A hierarchical summary receiver operating characteristic (HSROC) curve was generated to present the summary estimates of sensitivities and specificities along with their corresponding 95% confidence interval (CI) and prediction region. An area under the HSROC curve (AUC) between 0.9 and 1.0 indicated that diagnostic accuracy was good.
When a bivariate random-effects approach was not possible due to limited number of studies, we applied a random effects model following DerSimonian’s method using MetaDisc software. In that case a summary receiver operating characteristics (sROC) curve was plotted using DerSimonian and Lair’s model to present summary sensitivity and specificity estimates through the AUC or Q* index[13-15].
To determine whether FIT´s accuracy to detect CRC out of screening setting was influenced by high-risk symptoms, studies were classified by percentage of reported symptoms and CRC prevalence. Cohorts formed solely by patients who consult for abdominal symptoms represent a population with a better chance of high-risk symptoms of CRC (e.g., rectal bleeding). Prespecified CRC prevalence values (< 2.5% and ≥ 2.5%) were used to ensure an adequate number of data sets for each analysis. A bivariate model was fitted for each subgroup; direct comparison between them was performed using STATA (xtmelogit command).
Threshold effect and other sources of heterogeneity
Threshold effect was examined by calculating Spearman’s rank correlation (P < 0.1 was considered to be statistically significant), and ROC space plots were used to represent the sensitivity against 1-specificity of each study. In addition to the visual inspection of the forest plots of accuracy estimates, statistical tests, including Chi-square and Cochran’s Q tests, were used to ascertain whether inter-study differences were greater than expected based on chance alone (P < 0.1 suggested heterogeneity); the inconsistency index (I2) was used as a measure to quantify the degree of heterogeneity. The statistical methods of this study were reviewed by Noel Pin Vieito from Complexo Hospitalario Universitario de Ourense.
Literature search and study characteristics
Our initial literature search yielded a total of 12657 references. After abstract review, we identified 342 complete papers retrieved for manual searching, yielding 5919 additional potential sources of information; of these, 81 articles were selected for full-text review and 14 studies were ultimately considered relevant for our purpose (Figure 1)[17-30]. Inter-rater reliability was moderate (kappa 0.58). Individual unpublished data from derivation and validation cohorts included in the COLONPREDICT study were also used as these patients fitted the inclusion criteria. In total, 15 cohorts (13073 patients) were selected for qualitative synthesis. Full details of these studies are shown in Tables 1 and 2, and Appendix 2.
Table 1 Characteristics of the studies included in the meta-analysis.
AN: Advanced neoplasia; CRC: Colorectal cancer; DC: Derivation cohort; HDG: High-grade dysplasia; HRA: High risk adenoma; SCL: Significant colonic lesion; SxD and SD: Differences between sex and stage respectively can be calculated VC: Validation cohort; (U): Threshold units: μgrams of haemoglobin per gram of faeces.
The QUADAS-2 instrument highlighted an important risk of bias in the patient selection domain (Figure 2). Some patients could have been enrolled in a non-consecutive manner, and another five studies also evaluated diseases or situations that could compete with CRC as a cause of a positive FIT as exclusion criteria[18,21,22,24,25]. The greatest applicability concern arose from the patient selection category, as none of the samples analysed was fully representative of patients with low risk gastrointestinal symptoms reported in NG12.
Figure 2 Quality Assessment of Diagnostic Accuracy Studies.
Diagnostic performance for colorectal cancer
Table 3 and Figure 3 present summary sensitivity and specificity estimates calculated with a random effects model following the approach of DerSimonian’s method for each screening modality using OC-Sensor®. Figure 4 shows the sROC curves at different thresholds. The highest AUC was obtained at a 20 µg Hb/g faeces threshold (AUC = 0.93, 95%CI 0.90-0.96). Furthermore, studies using OC-Sensor® with various thresholds higher than 20 µg Hb/g faeces[17,18,23], and also studies using HM-JACK®[19,24], HM-JACKarc® and FOB Gold® have been published but their data could not be pooled due to the scarce number of studies in those thresholds. Individual data are shown in Table 4.
Table 3 Colorectal cancer detection: Diagnostic accuracy parameters based on quantitative faecal immunochemical test for haemoglobin threshold concentration and brand (DerSimonian´s method)".
1Values are expressed as percentages and its 95% confidence interval;
2Values are expressed as percentages;
3Values are expressed as absolute numbers and its 95% confidence interval;
4The studies that comprise the 100% symptomatic subgroup also have colorectal cancer prevalence ≥ 2.5%; Pa: Significance of the threshold effect using the Spearman rank correlation (P < 0.01 is considered statistically significant). I2: Inconsistency index; LoD: Limit of detection; LR: Likelihood ratio; OR: Odds ratio; CRC: Colorectal cancer.
Figure 3 Pooled sensitivity and specificity of faecal immunochemical tests for colorectal cancer detection based on threshold and branch (DerSimonian´s method).
CI: Confidence interval; DC: Derivation cohort; VC: Validation cohort.
Table 4 Diagnostic accuracy parameters for colorectal cancer detection based on quantitative faecal immunochemical test for haemoglobin threshold concentration and brand.
1Values are expressed as percentages and its 95% confidence interval.
Figure 4 Summary receiver operating characteristic curve for colorectal cancer detection at different thresholds and branches (DerSimmonian and Lair´s model).
LoD: Limit of detection; AUC: Area under the curve; SROC: Summary receiver operating characteristic.
We found substantial heterogeneity between studies when calculating the pooled sensitivity for almost every threshold analysed in the studies evaluating OC-Sensor® (Table 3). Spearman’s rank correlation coefficient was higher than 0.1, suggesting an absence of threshold effect in all cases. The scarce number of studies limited our intent to determine the existence of publication bias using funnel plots. However, when plotting each study’s diagnostic odds ratio (dOR) in a logarithmic scale against its sample size, we did not identify any trends towards asymmetry around the axis traced by the pooled dOR value for any analysed threshold, which suggests the absence of this possibility (Figure 5).
Figure 5 Funnel scatterplot to evaluate publication bias for studies using OC-Sensor® with different thresholds to detect colorectal cancer.
Each point in the plot represents a study with its diagnostic odds ratio (dOR) and sample size. A symmetric image around an axis traced by the pooled dOR value suggests absence of publication bias. Asymmetry with study concentration on the right side (the side with higher diagnostic odds ratio values) suggests publication bias with less negative studies published. dOR: Diagnostic odds ratio.
Subgroup and bivariate analysis
Although the number of studies limited our ability to use bivariate and HSROC models for most subgroups, the number of available studies performed with the OC-Sensor® enabled us to perform a subgroup analysis based on CRC prevalence and percentage of symptoms at the 10 µg Hb/g faeces threshold (10400 patients). Pooled estimates of sensitivity for studies comprised solely by symptomatic patients (n = 4035) and mixed cohorts (n = 6365) were 94.1% (95%CI: 90.0% to 96.6%) and 85.5% (95%CI: 76.5% to 91.4%) respectively (P < 0.01), while there were no statistically significant differences between pooled sensitivity of studies with CRC prevalence < 2.5% (84.9%, 95%CI: 73.4% to 92.0%) and ≥ 2.5% (91.7%, 95%CI: 83.3% to 96.1%) (P = 0.25). FIT sensitivity was equal or higher than 90% for almost every situation analysed (Table 3 and Figure 6).
Figure 6 OC-Sensor® pooled sensitivity estimates for colorectal cancer detection (subgroup analysis using DerSimonian´s method).
CRC: Colorectal cancer.
Conversely, pooled specificities were significantly different when comparing studies both by percentage of symptoms (solely symptomatic = 66.0%; 95%CI: 47.1% to 80.9% vs lesser percentage of reported symptoms = 89.3%; 95%CI: 84.1% to 93.0%, P = 0.01) as by CRC prevalence (CRC prevalence < 2.5% = 90.5%; 95%CI: 89.0% to 91.9% vs CRC prevalence ≥ 2.5% = 69.3%; 95%CI: 53.5% to 81.6%, P < 0.01).
A comparison between summary sensitivity and specificity estimates calculated with both methods is shown in Table 5 and generated HSROC curves in Figure 7. OC-Sensor® accuracy parameters (threshold 10 µgHb/g faeces) estimated by bivariate model from both ‘100% symptomatic’ and ´mixed cohort´ subgroups, were used to calculate different post-test probabilities through Fagan nomograms on the basis of various CRC prevalence (Figures 8 and 9).
Table 5 OC-Sensor® diagnostic accuracy parameters for colorectal cancer detection (Threshold 10 µg Hb/g faeces) estimated with DerSimonian vs Bivariate methods.
1Values are expressed as percentages and its 95% confidence interval;
2Values are expressed as absolute numbers and its 95% confidence interval. Bv: Bivariate; CRC: Colorectal cancer; D: DerSimonian; LR: Likelihood ratio; OR: Odds ratio.
Figure 7 Hierarchical summary receiver-operating characteristic curves for colorectal cancer detection generated using different subgroups of studies.
A: All studies; B: 100% symptomatic; C: Mixed cohorts. HSROC: Hierarchical summary receiver operating characteristic.
Figure 8 Relationship between colorectal cancer prevalence, clinical spectrum and accuracy of faecal immunochemical test for haemoglobin to rule out colorectal cancer.
A: There is no correlation between colorectal cancer (CRC) prevalence and faecal immunochemical test for haemoglobin (FIT) sensitivity; B: Pooled FIT sensitivity to detect CRC cancer estimated from studies with ‘Mixed cohorts’ is significantly lower than estimated with ‘100% symptomatic’ cohorts; C: Number of missed CRC per 1000 assessed symptomatic patients with colorectal cancer calculated through Fagan nomograms under various assumptions (FIT accuracy parameters estimated with mixed cohorts or 100% symptomatic cohorts) and CRC prevalence. CRC: Colorectal cancer; FIT: Faecal immunochemical test for haemoglobin.
Figure 9 Fagan nomograms used to calculate post-test probabilities based on different scenarios defined by colorectal cancer prevalence and supposed accuracy of OC-Sensor (Threshold 10 µg Hb/g faeces).
A-C; These scenarios are defined by colorectal cancer (CRC) prevalence of 1%, 3% and 13% respectively and faecal immunochemical test for haemoglobin (FIT) accuracy parameters used were the pooled estimates calculated with ‘mixed cohorts’ studies; D-F; These scenarios are defined by CRC prevalence of 1%, 3% and 13% respectively and FIT accuracy parameters used were the pooled estimates calculated with ‘100% symptomatic’ studies. CRC: Colorectal cancer; FIT: Faecal immunochemical test for haemoglobin.
Secondary endpoints: diagnostic performance for AN and SCL
Besides the COLONPREDICT study cohorts[29,31], nine[17,19-22,24-26,30] and four[20,26-28] studies provided information on the FIT’s accuracy for AN and SCL detection, respectively, with heterogeneous definitions. Furthermore, Terhaar sive Droste et al published data on FIT´s accuracy for AN detection in 2145 patients included in van Turenhout´s study. AN was defined as CRC plus high-risk[19-21,26] vs advanced[17,22,24,25,30-32] adenoma. This variability was greater for the definition of SCL. Some studies defined SCL as cancer plus high-risk adenoma plus inflammatory bowel disease[20,26], whereas Godber et al expanded that definition to include other types of colitis. A broader definition was used by Cubiella et al[29,31] including CRC, advanced adenoma, polyposis, colitis, polyps ≥ 10 mm, complicated diverticular disease, colonic ulcer and bleeding angiodysplasia. Auge et al provided data about FOB Gold® accuracy for colonic lesion detection regardless of its importance. Finally, as long as Widlack et al added a single case of high-grade dysplasia to 24 cases of CRC, we decided to include their study within the CRC group.
Summary sensitivity and specificity estimates for AN and SCL detection are shown in Table 6. Once again, studies evaluating OC-Sensor® with different thresholds[17,21,29,31,32], HM-JACK®, HM-JACKarc®[22,27] or FOB Gold® have been published but their number was insufficient to enable pooling of data in homogeneous groups. Individual data are shown in Tables 7 and 8.
Table 6 Advanced neoplasia and significant colonic lesion detection: Diagnostic accuracy parameters based on quantitative faecal immunochemical test threshold concentration and brand (DerSimonian´s method).
1Values are expressed as percentages and its 95% confidence interval;
2Values are expressed as percentages;
3Values are expressed as absolute numbers and its 95% confidence interval;
4The studies that comprise the 100% symptomatic subgroup also have CRC prevalence ≥ 2.5%; Pa: Significance of the threshold effect using the Spearman rank correlation (P < 0.01 is considered statistically significant). I2: Inconsistency index; LoD: Limit of detection; LR: Likelihood ratio; OR: Odds ratio; CRC: Colorectal cancer.
Table 7 Diagnostic accuracy parameters for advanced neoplasia detection based on quantitative faecal immunochemical test for haemoglobin threshold concentration and brand.
1Values are expressed as percentages and their 95% confidence interval. DC: Derivation cohort; VC: Validation cohort.
Statement of principal findings
This meta-analysis confirms that FIT is useful for triaging referrals in people with lower abdominal symptoms. Most studies have been performed using OC-Sensor® assay; using this brand, the high pooled estimates of sensitivity for CRC shown at f-Hb thresholds from limit of detection (LoD) to 20 µg Hb/g faeces, demonstrates this brand’s ability to stratify which symptomatic patients are more likely to have CRC.
Furthermore, the optimal OC-Sensor® performance (maximising both sensitivity and specificity) appeared to occur with f-Hb thresholds between 10 and 20 μg Hb/g faeces as FIT specificity is too low at a LoD f-Hb threshold. Since fewer cases of CRC will be missed with the former, 10 μg Hb/g faeces may be the most suitable threshold for CRC assessment of patients with symptoms (sROC AUC 0.92). In fact, subgroup analysis at this threshold demonstrates that regardless of CRC prevalence, summary estimates of sensitivity are higher when calculated from studies where all patients are overtly symptomatic than from mixed cohorts. Moreover, if we aim to rule out not only CRC but also other SCL using the same threshold, OC-Sensor® accuracy decreases showing lower sensitivities without improving specificity.
Finally, although information related to FIT accuracy to detect different targets have been reported using other brands and thresholds (HM-JACK, HM-JACKarc and FOB Gold), we could not pool their data due to the scarce number of homogeneous studies. Consequently, we could not assume the same degree of evidence for them.
Strengths and weaknesses
The limited number of studies did not enable us to tackle the high expected heterogeneity for all the different thresholds and assays available. Several factors could account for the heterogeneity detected: CRC prevalence, demographic characteristics, tumour location and stage, sample contamination (e.g., haemorrhoids), or FITs. As reported in Table 1, there were many inter-study differences, but the low number of studies included in our review did not enable us to perform a subgroup analysis for most of them. This also limited our ability to conduct statistical pooling using bivariate and HSROC models, which offer the strongest conclusions regarding diagnostic performance. In contrast, random effects methods incorporate a slight degree of heterogeneity among study results. Where possible, we applied both models to calculate pooled estimates of accuracy showing very similar results. Despite this, the strategy to include both studies performed on different percentages of symptomatic patients and the individual data of the COLONPREDICT study, enabled us to determine the diagnostic accuracy of the FIT at different thresholds and check the test’s diagnostic accuracy at different patient spectra with a different percentage of symptomatic patients and CRC prevalence.
An additional focal point of our review was to ascertain whether all FIT brands shared similar accuracy values. Only four studies with varying thresholds and settings reported the accuracy parameters of the HM-JACK®[19,24] HM-JACKarc® and FOB Gold® systems to detect CRC and no study to date has directly compared the performance of different FITs. Finally, we evaluated the diagnostic performance of the FIT in detecting SCLs. However, we must highlight that the main limitations of our analysis were the varying definitions and diagnostic criteria for both advanced (or high-risk) adenoma and SCL among the studies.
Strengths and weaknesses in relation to other studies
A prior systematic review assessed the value of symptoms and additional diagnostic tests for CRC assessing, including FIT, in symptomatic primary care patients. This review was completed in 2008 and included only three studies involving quantitative FITs. Another systematic review was recently performed to provide information on the new NICE DG30 diagnostic guidelines. We expanded previous inclusion criteria to assess the performance of FIT on samples with different percentage of symptoms and CRC prevalence, since the population included in that meta-analysis was not representative of the criteria reported in NG12. In fact, the studies included had major variability in terms of CRC prevalence.
Moreover, to ascertain whether FIT´s accuracy to detect CRC changes in symptomatic patients may be challenging. There are few studies on heterogeneous populations outside a screening setting and categorising those studies according to the presence and type of symptoms is difficult due to unspecific abdominal symptoms commonly associated with bowel cancer (such as abdominal pain or changing bowel habit) are common and sometimes unreported among apparently healthy people. This not only diminishes the value of symptoms as a diagnostic tool as previously reported[39,41,42], but means that even a significant proportion of individuals taking part in CRC screening programmes could suffer from unreported lower gastrointestinal symptoms. This could also explain why in some studies SCL prevalence has been revealed to be similar between patients suffering from nonspecific abdominal symptoms and supposedly ‘asymptomatic’ symptoms, unlike what is expected[43,44].
Our results suggest that although FIT may play a key role in the evaluation of symptomatic patients, it should not be used alone to rule out CRC. In fact, FIT should be interpreted considering the whole clinical spectrum including variables such as sex and age. Moreover, high-risk symptoms like rectal bleeding or diarrhoea may affect the amount of f-Hb detected. FIT accuracy could be higher in this setting than in unspecific low-risk symptoms which are also more in line with the NG12 scenario reported.
This clinical concern may affect the expected number of missed CRC as previously discussed elsewhere. Therefore, we checked the performance of FIT in different theoretical situations defined in Figure 8 by means of what we try to represent as the sources of uncertainty of actual decision-making. For example, if we ‘erroneously’ assumed that FIT sensitivity to rule out CRC is 94.1% for any symptomatic patient after being estimated by pooling ‘100% symptomatic’ studies which have higher percentages of high-risk symptoms such as rectal bleeding, but the ‘true value’ were 85.5% (estimated by ‘mixed cohorts’) we would miss 1, 2 and 10 unexpected additional CRCs in populations with a CRC prevalence of 1%, 3% and 13%, respectively, for each 1000 symptomatic patients with CRC assessed.
Nevertheless, it is important to note that the aim of performing a FIT in a symptomatic patient is not only to rule out CRC as long as other conditions, such as IBD, may also present the same symptoms. Unfortunately, we could only estimate the pooled accuracy parameters of three studies performed with the OC-Sensor® at LoD and 10 µg Hb/g faeces thresholds respectively, with sensitivity estimates ranging from 91.7% to 80.4%. Despite the weakness previously discussed, these results are consistent with the results of Hogberg et al’s study, which demonstrated that a qualitative FIT with a LoD f-Hb threshold could identify 87.5% and 90% of cases of CRC and IBD in unselected primary care patients, respectively.
Unanswered questions and future research
Although our results support the use of FIT in optimising the number of urgent referrals and helping to define a patient cohort with a negligible risk of CRC that would not require any referral, caution is recommended when using it outside the screening setting for symptomatic patients. FIT´s accuracy for detecting SCL appears to be not equally reliable in every patient subgroup. Finally, whether to exclude the use of further diagnostic tests in symptomatic patients with high CRC prevalence is doubtful, especially if symptoms persist. Thus, existing FIT-based prediction models[25,31,46] and recently published results[47,48] should also be validated directly, comparing different FIT brands and stratifying by clinical spectrum, while future biomarkers[49,50] should also be evaluated and compared with the FIT to incorporate objective criteria that can safely rule out CRC diagnosis.
In conclusion, our meta-analysis reveals that sensitivity for CRC may change across populations with differences in clinical symptoms, irrespective of CRC prevalence. In addition, FIT is not sensitive enough to exclude other significant colonic diseases. Future studies solely concerned with patients consulting for low risk symptoms are needed to better assess the role of FIT in ruling out CRC in this subgroup. Meanwhile, a single f-Hb cut-off of 10 mg Hb/g faeces could be used in this population to identify which patients may benefit from a “watching and waiting” strategy without this involving to avoid further workup, irrespective of FIT result, if there is no response to treatment.
Colorectal cancer (CRC) is the third most common cancer worldwide and the fourth leading cause of cancer-related death. The majority of cancers are still diagnosed after symptomatic presentation, and the quantitative faecal immunochemical test for haemoglobin (FIT) has been revealed to be more accurate for the detection of CRC than multiple clinical referral criteria in symptomatic patients referred for colonoscopy. Hence, The National Institute for Health and Care Excellence (NICE) has recently issued referral guidance for suspected CCR in which FIT is recommended for certain low risk symptomatic patients using a 10 µg Hb/g faeces threshold.
Although NICE recommendation applies only to patients with low risk symptoms in primary care, the studies done to date were mainly concerned with patients who had already been referred to secondary care and were not only concerned with patients with low risk symptoms. Thus, further work is required to find out if FIT´s ability to rule out CRC may change through the broad spectrum of symptomatic patients.
We aimed to systematically review the literature for published studies out of CRC screening programme setting, to compare FIT accuracy for CRC detection in different clinical spectrum through a meta-analysis. Secondary goal included assessing the usefulness of FIT to detect significant colonic lesions (SCLs) in symptomatic patients.
We performed an electronic search in MEDLINE and EMBASE databases (from database inception to May 2018) using a sensitive search of “FIT for CRC” narrowing our search to prospective cohort studies performed on adult patients when at least a fraction of symptomatic patients was included. To identify further relevant studies, we checked the reference lists of all articles extracted. We classified studies on the basis of brand and threshold of faecal haemoglobin (f-Hb) concentration for a positive test result to limit heterogeneity. Finally, a bivariate model was fitted for subgroups defined by CRC prevalence and percentage of symptoms, for direct comparison between them.
We identified fourteen studies that matched the search criteria, and individual unpublished data from cohorts included in the COLONPREDICT study were also used enrolling 10400 patients using OC-Sensor® at the f-Hb cut-off of 10 mg Hb/g faeces. Pooled estimates of sensitivity for studies formed solely by symptomatic patients (94.1%) were significantly higher than for mixed cohorts (85.5%), while there were no statistically significant differences between pooled sensitivity of studies with different CRC prevalence (< 2.5% and ≥ 2.5%). At the same threshold, OC-Sensor® sensitivity to rule out any SCL was 78.6%.
This meta-analysis suggests that FIT sensitivity to detect CRC is higher in studies solely including symptomatic patients irrespective of CRC prevalence, but may not be sensitive enough to rule out all SCLs. We hypothesize that differences between both groups could be justified due to cohorts solely including symptomatic patients could present a higher percentage of symptoms related to higher amounts of f-Hb as rectal bleeding or diarrhoea, but the study design is not suitable to prove this hypothesis.
More data are warranted in order to compare FIT accuracy for CRC detection in patients with different clinical spectrum, to identify a subgroup of symptomatic patients where FIT can safely rule out CRC. Future prospective cohort studies solely concerned with patients consulting for low risk symptoms and stratifying by sex and age could help to get this aim.
Manuscript source: Invited manuscript
Specialty type: Gastroenterology and hepatology
Country of origin: Spain
Peer-review report classification
Grade A (Excellent): A
Grade B (Very good): 0
Grade C (Good): C
Grade D (Fair): 0
Grade E (Poor): 0
P-Reviewer: Biondi A, Lieto E P S-Editor: Yan JP L-Editor: A E-Editor: Ma YJ