GU Radiologists vs AI in Detection of Prostate Cancer on MRI

Free-response receiver operating characteristics (FROC) analysis for index lesion detection for 126 patients in the evaluation cohort with detection sensitivity plotted as a function of the number of false-positive detections for each patient on average. The shaded area surrounding the FocalNet curve (blue) shows the 95% confidence interval for detection sensitivity by bootstrapping the patient population. Dots indicate each radiologist performance at suspicion score thresholds.

Background: Several deep learning-based techniques have been developed for prostate cancer (PCa) detection using multi-parametric MRI (mpMRI), but few of them have been rigorously evaluated relative to radiologists’ performance or whole-mount histopathology (WMHP). Purpose: To compare the performance of a previously proposed deep learning algorithm, FocalNet, and expert radiologists in the detection of PCa on mpMRI with WMHP as the reference. Study type: Retrospective, single-center study. Subjects: 553 patients (development cohort: 427 patients; evaluation cohort: 126 patients) who underwent 3 T mpMRI prior to radical prostatectomy from October 2010 to February 2018. Field Strength/Sequence: 3 T, T2-weighted imaging and diffusion-weighted imaging. Assessment: FocalNet was trained on the development cohort with the groundtruth lesion annotations. In evaluation cohort, FocalNet predicted PCa locations by detection points, with a confidence value for each point. Four fellowship-trained genitourinary (GU) radiologists independently evaluated the evaluation cohort to detect suspicious PCa foci, annotate detection point locations, and assign a five-point suspicion score (suspicion score 1: the least suspicious to PCa, suspicion score 5: the most suspicious to PCa) for each annotated detection point. The PCa detection performance of FocalNet and radiologists were evaluated by free-response receiver operating characteristics analysis for lesion detection sensitivity versus the number of false-positive detections at different thresholds on suspicion scores. The overall differential detection sensitivity is the detection sensitivity difference between each radiologist and FocalNet averaging over five suspicion score thresholds and four readers. Index lesions are defined as lesions with the highest Gleason Group and the largest pathological size, and clinically significant lesions are those with Gleason Group ≥ 2 or pathological size ≥10 mm. Statistical tests: Bootstrap hypothesis test for the detection sensitivity between radiologists and FocalNet. Results: For detection points under high with suspicion score 5, index lesion detection sensitivity was 38.3% for the radiologists and 38.7% for FocalNet at the cost of 0.101 false positives per patient. For detection points with suspicion score 4 or 5, index lesion detection sensitivity was 63.1% for the radiologists and 54.7% for FocalNet at the cost of 0.365 false positives per patient. For the overall differential detection sensitivity, FocalNet was 5.1% and 4.7% below the radiologists for clinically significant and index lesions, respectively; however, the differences were not statistically significant (P=0.413 and P=0.282, respectively). Data Conclusion: On the evaluation cohort, FocalNet achieved slightly lower but not statistically significant PCa detection performance compared to GU radiologists. Compared with radiologists, FocalNet demonstrated similar detection performance for a highly sensitive setting (suspicion score ≥ 1) or a highly specific setting (suspicion score = 5) while lower performance in between.