Evaluating ChatGPT4's ability in determining suitable radiology referrals
A recent study has explored the reliability of ChatGPT in grading radiology exam requests using the Reason for exam Imaging Reporting and Data System (RI-RADS). The study, conducted at a single centre and involving 450 imaging referrals, focused on inpatients' radiology requests as these are directly entered into the electronic health record system by their treating physicians.
The study found that while ChatGPT shows potentially good performance in radiology-related tasks, its application in structured radiology grading systems like RI-RADS has not been directly validated yet. This is because RI-RADS involves nuanced evaluation of clinical reasoning and categorization based on indications for imaging, which may require specialized training or integration beyond generic ChatGPT capabilities.
However, related findings on ChatGPT’s reliability in medical imaging and cancer-related contexts provide relevant insights. For instance, ChatGPT-4 has shown high accuracy and reliability in cancer-related queries, outperforming ChatGPT-3.5 with fewer incorrect responses and more detailed accurate answers as evaluated by specialists. Additionally, ChatGPT-4 demonstrated a good diagnostic performance detecting hip fractures on pelvic X-rays, with an overall accuracy of 82.5%.
Despite these promising results, the study revealed that the distributions of the RI-RADS grade and the subcategories differed statistically significantly between the radiologist and ChatGPT, apart from RI-RADS grades C and X. The agreement between the two human readers was almost perfect (κ: 0.96), while the reliability between the radiologist and ChatGPT in assigning RI-RADS score was very low (κ: 0.20).
In 2% of cases, ChatGPT assigned the wrong RI-RADS grade, based on the ratings given to the subcategories. RI-RADS D was the most prevalent grade assigned by humans (54% of cases), while ChatGPT more frequently assigned the RI-RADS C (33% of cases).
The study utilized anonymized data and did not require ethical committee approval due to the absence of patient-identifiable information. The customized RI-RADS GPT used in the study was trained using specific instructions and RI-RADS examples for each grade.
Radiologists rely on high-quality radiology request forms to choose the right imaging technique and interpret examinations accurately. The low number of complete imaging referrals in the study highlights the need for improved processes to ensure the quality of radiology requests.
Future research specifically targeting ChatGPT's application in RI-RADS grading would be valuable to determine its true reliability and clinical utility in this niche domain. Caution and further clinical validation are recommended before deploying ChatGPT for grading radiology exam requests specifically with RI-RADS criteria.
- The application of ChatGPT in structured radiology grading systems like RI-RADS, which involves evaluating medical-conditions such as chronic-kidney-disease, requires further validation and specialized training, as its reliability hasn't been directly established yet.
- Strengthening the quality of radiology requests becomes crucial when considering the low number of complete imaging referrals, as these requests significantly impact health-and-wellness outcomes, particularly in the diagnosis of chronic-diseases like chronic-kidney-disease.
- While ChatGPT has demonstrated remarkable performance in cancer-related queries and medical-imaging domains, its potential role in space-and-astronomy, as a subset of science, still remains untapped and warrants further investigation.