S- Editor Wang J L- Editor Wang XL E- Editor Liu WF
Published online Feb 28, 2006. doi: 10.3748/wjg.v12.i8.1249
Revised: May 25, 2005
Accepted: June 2, 2005
Published online: February 28, 2006
AIM: To determine the performance of novice readers (4th year medical students) for detecting capsule endoscopy findings.
METHODS: Ten capsule endoscopy cases of small bowel lesions were administered to the readers. Gold standard findings were pre-defined by gastroenterologists. Ten gold standard “targets” were identified among the 10 cases. Readers were given a 30-min overview of Rapid Reader software and instructed to mark any potential areas of abnormalities. A software program was developed using SAS to analyze the thumbnailed findings.
RESULTS: The overall sensitivity for detecting the gold standard findings was 80%. As a group, at least 5 out of 10 readers detected each gold standard finding per recording. All the gold standard targets were identified when the readers’ results were combined. Incidental finding/false positive rate ranged between 8.2-59.8 per reader.
CONCLUSION: A panel of medical students with minimal endoscopic experience can achieve high sensitivity in detecting lesions on capsule endoscopy. A group of novice readers can pre-screen recordings to thumbnail potential areas of small bowel lesions for further review. These thumbnails must be reviewed to determine the clinical relevance. Further studies are ongoing to assess other cohorts.
- Citation: Chen GC, Enayati P, Tran T, Lee-Henderson M, Quan C, Dulai G, Arnott I, Sul J, Jutabha R. Sensitivity and inter-observer variability for capsule endoscopy image analysis in a cohort of novice readers. World J Gastroenterol 2006; 12(8): 1249-1254
- URL: https://www.wjgnet.com/1007-9327/full/v12/i8/1249.htm
- DOI: https://dx.doi.org/10.3748/wjg.v12.i8.1249
Capsule endoscopy is a new diagnostic procedure developed for the complete examination of the small intestine through video images transmitted from an ingestible camera[1-3]. Briefly, the PillCamTM capsule endoscopy and diagnostic imaging system (GIVEN Imaging, Yoqneam, Israel) is a commercially available system consisting of three major components: PillCamTM capsule which captures images and transmits digital pictures (at 2 frames/s) over an 8-h period, sensor array and data recorder, which receives and records the data transmitted from the PillCamTM capsule and RAPIDTM Workstation, which is used to initialize the data recorder and to download and process the raw data from the data recorder[4,5]. The processed information, composed of approximately 50 000 still images collected over an 8-h period, can be reviewed as a continuous video stream. The reported time range typically needed for a complete review of a single capsule endoscopy recording case is anywhere from 50[6,7] to 120 min.
Numerous studies have now demonstrated that the sensitivity and specificity of capsule endoscopy are advantageous over the traditional diagnostic methods of small bowel lesions[5-7,9-11]. Capsule endoscopy may also reduce total medical utilization and costs as well as improve patient’s quality of life in certain circumstances.
One feature that can affect the diagnostic yield of capsule endoscopy is the image analysis process, i.e. the ability of the person reviewing the images (reader) to accurately detect significant lesions and interpret the findings. This process is time consuming and requires individuals to focus their undivided attention viewing the large number of images.
Currently, the process of capsule endoscopy image analysis has not been standardized with respect to the selection and training of individual readers, determination of the gold standard to which findings are compared to assess sensitivity and false positive rates or reporting of findings and diagnoses. Unfortunately, these important issues have not been well studied previously. Studies of inter-observer variability have been limited to anecdotal reports between 1 and 4 different readers[13-20,23,24]. Furthermore, since capsule endoscopy image analysis is a time consuming process, the arduous process of image recognition and analysis is often delegated to individuals having received minimal pre-training with little consideration of their ability to achieve competency in reading capsule endoscopy recordings. A survey was conducted at the 2003 Given International Capsule Endoscopy Conference and found that 82% of gastroenterologists reported that they are the first readers to interpret the capsule endoscopy recordings, while 18% use a resident physician assistant and/or nurse to interpret the capsule endoscopy recordings first.
In this clinical study, our aim was to determine the sensitivity, incidental finding and/or false positive rate, and intraclass correlation of novice capsule endoscopy readers who were 4th year US medical students with minimal endoscopic background for detecting pre-specified capsule endoscopy findings. Previous studies have shown that it is not a simple task to achieve 100% sensitivity on capsule endoscopy recording[13-20]. In addition, it was reported that since the pathology is visualized in more than a small percentage of images from each capsule endoscopy recording, a fatigue gastroenterologist may analyze the capsule endoscopy recording at a very rapid speed, being likely to miss lesions. Hence, we propose that analysis of the same capsule endoscopy recording by multiple readers might be an effective method to achieve 100% sensitivity and decrease medical errors. If the combined results of the novice readers show a high sensitivity, then perhaps novice readers can be considered as physician extenders in analyzing capsule endoscopy recordings. In our study, instead of manually analyzing the capsule endoscopy readers’ results, we used statistical software to perform the analysis in an attempt to decrease the time required for this process. The reason is that manual comparison of the readers’ findings can be labor intensive and time consuming, if a large number of readers are evaluated. Finally, the method of using medical students as novice readers to analyze the images of a diagnostic modality has been described in the literature.
Ten recordings with definitive sites of small bowel lesions were administered to the readers in a pre-set order (lesions - AVM-3, small bowel tumor - 1, radiation enteritis - 1, ulcers/aphthous lesions - 3, and foreign body with ulceration - 2). Two gastroenterologists (attending physicians at the tertiary medical center) selected these 10 recordings.
The novice capsule endoscopy readers consisted of a group of ten 4th year medical students with minimal endoscopic background from David Geffen School of Medicine at UCLA (Los Angeles, CA, USA). All the participating medical students signed an informed consent agreement as approved by the local institutional review board.
The readers were blinded to the patients’ clinical history because this study was to assess the readers’ abilities to detect small bowel lesions on capsule endoscopy recordings rather than to test their medical knowledge. Furthermore, the readers were blinded to each other’s capsule endoscopy findings.
The gold standard for the true positive findings was pre-defined by the two gastroenterologists (over 150 capsule endoscopy cases each at the time when this study was started) who independently reviewed all the available data for each of the 10 recordings, including pertinent medical history; previous endoscopic, radiologic and surgical examinations; the complete 8-h capsule endoscopy recording. The experts’ consensus of positive findings was used to calculate the sensitivity and false positive measures for each individual reader as defined below. Ten gold standard “targets” were identified among the 10 cases. We did not include any negative findings in this study because our aim was to evaluate the readers’ ability to detect positive findings. However, the readers did not know that there was at least one gold standard finding per case.
Each reader reviewed the entire 8-h recordings for all 10 cases to localize the thumbnailed significant lesions within the small intestine. Significant upper and lower gastrointestinal lesions could be detected by capsule endoscopy, but the lesions of esophagus, stomach, and colon were not analyzed. Presumably, lesions in these areas were detected during routine endoscopic evaluation.
The readers analyzed the 10 recordings in a consecutive order over a 30-90 d period. They were also asked to record how long it took for them to interpret each case and were told to use a cautious and highly inclusive approach, while interpreting the capsule endoscopy recordings in order to minimize the chance of missing any clinically significant lesions. All findings identified by each reader were marked, thumbnailed and annotated using the Rapid Reader software program (GIVEN Imaging, Yoqneam, Israel). The readers were given a 30-min overview of the Rapid Reader software and instructions for thumbnailing. Active gastrointestinal bleeding was often detected by the Suspected Blood Indicator program (two capsule endoscopy cases we used had active gastrointestinal bleeding lesions); however, we did not allow the readers to use the Suspected Blood Indicator program on the Rapid Reader, since we wanted to assess the readers’ true abilities to detect the lesions on capsule endoscopy recordings. Furthermore, we felt that active bleeding lesions on the capsule endoscopy recordings should be easily and consistently detectable by the readers.
The percentage of cases where a reader had at least one finding in the gold standard areas of a case was expressed as the reader’s sensitivity. If a reader had a finding outside the gold standard time interval, it was considered an incidental finding/false positive rate.
The time series for each case was divided into time intervals using the PROC MIXED procedure in SAS (SAS Institute Inc., Cary, NC, USA). Within each time interval, it was noted whether or not each reader had at least one finding. For a given time interval size, a time interval could be designated as being in a true problem area (part of the time interval was in the “gold standard” area for that case) or not. Each time interval could also be designated as having a “finding” or not, where the “finding” was yes if X/10 readers had a finding in that time interval. X was the reader’s threshold. Reader’s threshold was defined as the minimum number of readers out of all the readers who had to have a finding in a given time interval in order to consider it as a true positive finding (namely a true problem area). The clinical implication of the optimal time interval size and reader’s threshold analyses was that this method could inform the capsule endoscopy readers, where the time series and the thumbnailed findings occurred the most. Therefore, the readers would know to which parts of the capsule endoscopy recording they needed to pay extra attention during the image analysis and review process. This is especially important and perhaps shortens the time needed for the gastroenterologists reviewing the thumbnailed findings of the screeners (in this case, the novice readers).
There were a total of 128 time interval size/reader’s threshold combinations per case. In each of these time interval size/reader’s threshold combinations, sensitivity and specificity were estimated for each case. Sensitivity was estimated as the probability of a finding, given the finding being in a region with a true finding (in the “gold standard” region), while specificity was estimated as the probability of no finding, given that the finding was not in an incidental/false positive region. Among the time interval size/reader’s threshold combinations with 100% sensitivity, we calculated the incidental finding/false positive rate (the number of time intervals with a finding outside the gold standard area divided by the total number of time intervals outside the gold standard area) and the standardized number of minutes viewing incidental finding/false positive time intervals (the incidental finding/false positive rate multiplied by the average number of true negative time intervals for that reader’s threshold/time interval size, multiplied by the time interval size in minutes). For each of these quantities, we calculated the average and the maximum value across all 10 capsule endoscopy cases. Separate analyses were conducted using different time interval sizes, which ranged in length from 20 to 25 000 s, in order to determine the impact of time interval size on the results.
Intraclass correlation was assessed separately for each case. The intraclass correlation among all 10 readers measured the agreement among the readers in their evaluation of capsule endoscopy recordings, above the agreement was expected by chance. The intraclass correlation coefficient was estimated for each time interval size and for each capsule endoscopy recording.
Based on the gold standard findings, 10 targets were specified in the 10 recordings used in this study (Table 1). The average time taken by the readers to interpret each case was 118 min. The overall sensitivity among the 10 readers was 0.80 (80%) for time interval size of 20 s. All findings were detected in 6 out of 10 readers. On a case level, the gold standard finding was identified by all the 10 readers in case #2 but only 5 readers for case #6. The individual reader sensitivity ranged between 60-100%, with reader #8 achieving 100% sensitivity, while reader #5 achieving only 60% sensitivity (Table 2). The readers were able to identify all the gastric, duodenal, and cecal images accurately. The number of incidental false positive finding ranged from a minimum of 8.2 in reader #1 to a maximum of 59.8 in reader #10 per recording (Table 3). By case, the number of incidental false positive findings ranged from 12 in case #9 to 40.1 in case #5. Intraclass correlation varied with case, but seemed to increase with increased time interval size. The overall intraclass correlation was <0.40 but cases #2 and #9, being low compared to fair agreement (Figure 1).
|1||1||22 500–32 900||Aphthous ulceration (from NSAIDS use)|
|2||2||4 542–6 842||Aphthous ulceration (from NSAIDS use)|
|3||3||16 144–16 157||Duodenal bleeding (AVM)|
|4||4||25 302–25 372||Staples and ulcerations (from prior Surgery)|
|5||5||24 244–24 700||Small bowel tumor|
|6||6||23 596–23 692||Aphthous ulceration (from Crohn’s disease)|
|7||7||32 657–50 000||Radiation enteritis|
|8||8||20 000–26 000||Chicken bone|
|10||10||12 600–12 800||Bleeding angiodysplasia|
|Reader number||Case 1||Case 2||Case 3||Case 4||Case 5||Case 6||Case 7||Case 8||Case 9||Case 10||Total findings||Sensitivity|
|Reader number||Case 1||Case 2||Case 3||Case 4||Case 5||Case 6||Case 7||Case 8||Case 9||Case 10||Total findings||Sensitivity|
In the time interval size and reader’s threshold analyses, the minimum time interval size for which sensitivity in all 10 recordings achieved 100% was 3 000 s. All possible reader thresholds achieved 100% sensitivity for all 10 recordings in at least one of the time interval sizes examined. The average percentage of incidental false positive findings in the 10 cases ranged from 28% with a reader’s threshold of 7 and time interval of 5 000 s to 66% with a reader’s threshold of 3 and a time interval of 10 000 s (Figure 2A). The maximum percentage of the incidental false positive findings in the 10 recordings ranged from 56% for time interval of 5 000 s and reader’s threshold of 7-100% for several combinations (Figure 2B).
Overall, in the optimal time interval size and reader’s threshold analyses, the combination of 5 000 s and reader’s threshold of 7/10 was the most optimal. This combination resulted in a reader’s sensitivity of 100%, a low average incidental finding/false positive percentage and a low number of minutes viewing incidental finding/false positive time interval video images. However, while this time interval was still relatively large (83 min), smaller time interval sizes had higher rates of errors and/or failed to yield 100% sensitivity.
Capsule endoscopy is a newly developed diagnostic modality that allows visualization of the entire small intestine. However, the process of selecting, training, and validating an individual’s ability to accurately perform capsule endoscopy image analysis has not been well studied. Therefore, we performed a systemic study to compare and validate the capsule endoscopy readers’ performance on capsule endoscopy image analysis. The goal of our study was to determine the sensitivity and incidental finding/false positive rate as well as the intraclass correlation of novice readers and to determine if the concept of analyzing the same capsule endoscopy recording by multiple novice readers was an effective and accurate approach for capsule endoscopy image interpretation.
We hypothesized that novice readers could reliably detect small bowel lesions with a high sensitivity and a large number of incidental/false positive findings. In our study, each reader made a moderate number of incidental or false positive findings per recording. The following factors can help explain the moderate number of incidental or false positive findings: 1) the novice readers were asked to perform the capsule endoscopy analysis in a detailed, thorough and highly cautious fashion in an attempt to minimize the possibility of missing small bowel lesions; 2) the readers were untrained in interpreting capsule endoscopy images; 3) the “lodging” of capsule around the same spot in the small bowel caused some of the readers to thumbnail the same lesion several times; and 4) some of the incidental or false positive findings were small lesions, such as focal petechiae, areas of erythema or “mucosal breaks” with their clinical significance being still debatable.
We found that novice readers with minimal endoscopic experience were able to detect lesions on capsule endoscopy with a moderate-to-high sensitivity. Though the majority of the readers were unable to achieve 100% individual sensitivity, if we view the results by this panel of novice readers as a whole, every single gold standard target was detected. Therefore, perhaps the concept of analyzing the same capsule endoscopy recording by multiple novice readers may be an alternative yet effective and accurate method to interpret the capsule endoscopy images. This alternative approach might decrease the risk of having any lesions undetected by a single reader.
Inter-observer variability in analyzing capsule endoscopy recordings has been studied by Levinthal et al. The combined sensitivity of the group of novice readers from our study is comparable with the result achieved by Levinthal et al. In another published series, inter-observer variability was evaluated by comparing the interpretation results on 20 capsule endoscopy cases of an attending gastroenterologist and a 4th year therapeutic endoscopy student who has reviewed 15 capsule endoscopy cases prior to the participation of this study. The authors found that there is a complete agreement between the two readers in 18/20 cases. Nonetheless, this study only compared the clinically significant findings and did not report the number of incidental/false positive findings.
However, studies on inter-observer variability on capsule endoscopy interpretation are mostly documented in abstract forms. Hoffman et al showed that physician extenders could save gastroenterologists’ time in capsule endoscopy interpretation.
Analyzing capsule endoscopy recordings requires a significant time commitment from the gastroenterologist. As a result, a few studies have investigated the potential of using physician extenders to serve as screeners for interpreting capsule endoscopy images. The results from our study showed that novice readers could achieve a high sensitivity in capsule endoscopy analysis when their results were combined as a group. Therefore, to analyze the same capsule endoscopy recordings by multiple novice readers may be the most effective and accurate method for detecting all significant lesions on capsule endoscopy. This is especially important because some lesions may appear in a single frame and could be easily missed by a single reader. An analogy to this method is the airport luggage screening process, in which the luggage is screened through the X-ray/CT scanners, whereby the television screen is monitored by “highly trained” individuals who detect the “high risk items” (analogous to lesions). Suspicious bags are subsequently re-X-rayed and screened by several individuals and then high-risk items are manually inspected (analogous to endoscopy, push enteroscopy or surgical investigation).
Our study is the first systematic study to date addressing the issues of inter-observer variability in capsule endoscopy image analysis by a large group (>4 individuals) of readers. The most important clinical conclusion of our study is that a panel of novice readers with minimal endoscopic experience can detect small bowel lesions on capsule endoscopy recordings and pre-screen recordings to thumbnail potential abnormalities with a high sensitivity, allowing the gastroenterologists to review only the thumbnailed potential abnormalities. This concept serves as an alternative method to those proposed in the previous studies (i.e. using gastroenterology students or endoscopy nurses). Furthermore, perhaps the most effective way to accurately detect all abnormalities on capsule endoscopy recordings is to analyze the same capsule endoscopy case by a number of readers. This approach to capsule endoscopy image analysis may decrease the number of medical errors. Our results suggest once again that physician extenders can serve as screeners for interpreting capsule endoscopy images and save a significant amount of time of the gastroenterologists and make capsule endoscopy more cost-effective and attractive to practising gastroenterologists. However, due to the moderate number of incidental/false positive findings, gastroenterologists must review these thumbnails to determine the clinical relevance of each finding. Future studies should also estimate the amount of time that gastroenterologists have to spend on the assessment of all the incidental and false positive findings by the physician extenders. Additional studies are ongoing to assess other reader cohorts’ (endoscopy nurses, gastroenterology students, medical residents, non-medical personnel) abilities to detect abnormalities on capsule endoscopy before physician extenders begin to screen capsule endoscopy in everyday clinical practise.
|1.||Iddan G, Meron G, Glukhovsky A, Swain P. Wireless capsule endoscopy. Nature. 2000;405:417. [PubMed] [DOI]|
|2.||Gostout CJ. Capsule Endoscopy. Clinical Update, American Society for Gastrointest Endosc. 2002;10:1-4.|
|3.||Fleischer DE. Capsule endoscopy: the voyage is fantastic--will it change what we do. Gastrointest Endosc. 2002;56:452-456. [PubMed] [DOI]|
|4.||Meron G. Development of the swallowable video capsule. Atlas of capsule endoscopy. Yoqneam, Israel: Given Imaging, Inc 2002; 3-7.|
|5.||Yu M. M2A capsule endoscopy. A breakthrough diagnostic tool for small intestine imaging. Gastroenterol Nurs. 2002;25:24-27. [PubMed] [DOI]|
|6.||Ell C, Remke S, May A, Helou L, Henrich R, Mayer G. The first prospective controlled trial comparing wireless capsule endoscopy with push enteroscopy in chronic gastrointestinal bleeding. Endoscopy. 2002;34:685-689. [PubMed] [DOI]|
|7.||Lewis BS, Swain P. Capsule endoscopy in the evaluation of patients with suspected small intestinal bleeding: Results of a pilot study. Gastrointest Endosc. 2002;56:349-353. [PubMed] [DOI]|
|8.||Costamagna G, Shah SK, Riccioni ME, Foschia F, Mutignani M, Perri V, Vecchioli A, Brizi MG, Picciocchi A, Marano P. A prospective trial comparing small bowel radiographs and video capsule endoscopy for suspected small bowel disease. Gastroenterology. 2002;123:999-1005. [PubMed] [DOI]|
|9.||Appleyard M, Fireman Z, Glukhovsky A, Jacob H, Shreiver R, Kadirkamanathan S, Lavy A, Lewkowicz S, Scapa E, Shofti R. A randomized trial comparing wireless capsule endoscopy with push enteroscopy for the detection of small-bowel lesions. Gastroenterology. 2000;119:1431-1438. [PubMed] [DOI]|
|10.||Hahne M, Adamek HE, Schilling D, Riemann JF. Wireless capsule endoscopy in a patient with obscure occult bleeding. Endoscopy. 2002;34:588-590. [PubMed] [DOI]|
|12.||Goldfarb NI, Phillips A, Conn M, Lewis BS, Nash DB. Economic and health outcomes of capsule Endoscopy: opportunities for improved management of the diagnostic process for obscure gastrointestinal bleeding. Dis Manag. 2002;5:123-135. [DOI]|
|13.||Breitinger A, Schembre D, Mergener K, Brandabur J. Can non-endoscopists screen capsule endoscopies. Am J Gastroenterol. 2002;97:S81. [DOI]|
|14.||Levinthal GN, Burke CA, Santisi JM. The accuracy of an endoscopy nurse in interpreting capsule endoscopy. Am J Gastroenterol. 2003;98:2669-2671. [PubMed] [DOI]|
|15.||Adler DG, Knipschield M, Gostout C. A prospective comparison of capsule endoscopy and push enteroscopy in patients with GI bleeding of obscure origin. Gastrointest Endosc. 2004;59:492-498. [PubMed] [DOI]|
|16.||Hoffman BJ, Glen T, Varadarajulu S, Cotton PB. Can we replace gastroenterologists with physician extenders for interpretation of wireless capsule endoscopy. Gastroenterology. 2003;124:A245. [DOI]|
|17.||Friedland S, Wu K, Soetikno RM. A Pilot Study of Capsule Endoscopy Reading by a Nurse Endoscopist. Gastrointest Endosc. 2004;59:M1833. [DOI]|
|18.||Rowbotham D. Ulcers, lies, and video speed: does clinical experience matter in wireless capsule endoscopy. Gastrointest Endosc. 2003;57:M1877.|
|19.||De Leusse A, Landi B, Edery J, Burtin P, Lecomte T, Seksik P, Bloch F, Jian R, Cellier C. Video capsule endoscopy for investigation of obscure gastrointestinal bleeding: feasibility, results, and interobserver agreement. Endoscopy. 2005;37:617-621. [PubMed] [DOI]|
|20.||Shaver CP, Rivera JR, McKinley J, Brady PG. Capsule Endoscopy Learning Curve. Gastrointest Endosc. 2004;59:S1545. [DOI]|
|21.||Consensus Statement. Given International Conference. 2003;.|
|22.||Standard Terminology for GIVEN M2A Capsule Endoscopy Study. version 1.0a. 2002;1-27.|
|23.||Saurin JC, Delvaux M, Gaudin JL, Fassler I, Villarejo J, Vahedi K, Bitoun A, Canard JM, Souquet JC, Ponchon T. Diagnostic value of endoscopic capsule in patients with obscure digestive bleeding: blinded comparison with video push-enteroscopy. Endoscopy. 2003;35:576-584. [PubMed] [DOI]|
|24.||Sigmundsson HK, Das A, Isenberg GA. Capsule endoscopy (CE): interobserver comparison of interpretation. Gastrointest Endosc. 2002;57:165.|
|25.||Hope MD, de la Pena E, Yang PC, Liang DH, McConnell MV, Rosenthal DN. A visual approach for the accurate determination of echocardiographic left ventricular ejection fraction by medical students. J Am Soc Echocardiogr. 2003;16:824-831. [PubMed] [DOI]|