EULAR Abstract Archive

Bookmarked

ABS0964 (2025)

EVALUATION OF THE ACCURACY OF ARTIFICIAL INTELLIGENCE TOOLS IN THE DETECTION OF DRUG INTERACTIONS IN RHEUMATOLOGY

Keywords: Artificial Intelligence, Safety, Disease-modifying Drugs (DMARDs)

R. Mazzucchelli¹, C. Sanz Sanchez², M. Prada Bou³, P. Turrado Crespí⁴, G. M. Silva Riadigos⁵, N. Crespí Villarías⁶, E. Pérez-Fernández⁷, C. Pijoán-Moratalla⁹, M. Pavo Blanco⁹, P. Zarco-Montejo⁹, R. Almodóvar⁹, P. Sanmartin Fenollera⁸

¹Hospital Universitario Fundación Alcorcón, Rheumatology Unit, Madrid, Spain
²Hospital Universitario Fundación Alcorcón, Servicio de Farmacia, Alcorcon, Spain
³Hospital Universitario Fundación Alcorcón, Pharmacology Department, Alcorcon, Spain
⁴Universidad Autónoma de Madrid, Facultad de Medicina, Madrid, Spain
⁵Dirección Asistencial Oeste de Atención Primaria, Pharmacy, Alcorcon, Spain
⁶Centro de Salud La Rivota, Médico de Atención Primaria, Alcorcon, Spain
⁷Hospital Universitario Fundación Alcorcón, Unidad de Investigación, Alcorcon, Spain
⁸Hospital Universitario Fundación Alcorcón, Pharmacy Department, Alcorcon, Spain
⁹Hospital Universitario Fundación Alcorcón, Rheumatology Department, Alcorcón, Spain

Background: Artificial Intelligence (AI) has shown great potential to improve the identification of drug interactions, especially in patients with multiple treatments, such as rheumatology patients. Tools like ChatGPT-3.5, ChatGPT-4, Gemini, and Copilot can offer quick assistance, but their accuracy compared to clinically validated databases like Lexicomp has not been exhaustively evaluated in a real clinical context. This study compares these AI platforms in detecting drug interactions in rheumatology.

Objectives: To evaluate the accuracy, sensitivity, and specificity of ChatGPT-3.5, ChatGPT-4, Gemini, Gemini Adv, and Copilot compared to the Drugs.com database and using Lexicomp as the gold standard, and to analyze the impact of using specific prompts on their performance.

Methods: The study included 520 drug interaction scenarios, generated from the 10 most prescribed drugs in rheumatology and the 50 most common drugs in the Community of Madrid in 2023. Interactions were classified by clinical relevance (grades 0 to 4). The results from each tool were compared with Lexicomp, the reference standard, and metrics such as accuracy, sensitivity, specificity, and area under the curve (AUC) were calculated.

Results: Of the 520 interactions analyzed, 35 were classified as clinically relevant by the Lexicomp database (grades 3 and 4). All AI tools showed AUC values above 0.8, with ChatGPT-4 being the best performer with an AUC of 0.882. A similar result was obtained with the Drugs.com database (AUC = 0.864). The only exception was the MUP tool, which achieved an AUC of 0.574 (see Figure 1, ROC Curves). In terms of sensitivity, ChatGPT-4 achieved 81%, followed by Gemini (78%), ChatGPT-3.5 (75%), and Copilot (73%), while MUP reached only 58%. Specificity was high across all platforms, with ChatGPT-4 achieving 88%, followed by Gemini (85%), ChatGPT-3.5 (83%), and Copilot (81%). MUP had a specificity of 71% (see Table 1 for detailed sensitivity, specificity, and other parameters).

Conclusion: AI tools, especially ChatGPT-4, showed performance comparable to the reference standard in identifying clinically relevant drug interactions, with high AUC, sensitivity, and specificity values. The use of specific prompts significantly improved their accuracy. The MUP tool, despite its advances, showed inferior performance and requires optimization for clinical application.

REFERENCES: NIL.

Figure 1.

Table 1.

Acknowledgements: NIL.

Disclosure of Interests: None declared.

© The Authors 2025. This abstract is an open access article published in Annals of Rheumatic Diseases under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ ). Neither EULAR nor the publisher make any representation as to the accuracy of the content. The authors are solely responsible for the content in their abstract including accuracy of the facts, statements, results, conclusion, citing resources etc.

DOI: annrheumdis-2025-eular.B37

Keywords: Artificial Intelligence, Safety, Disease-modifying Drugs (DMARDs)

Citation: , volume 84, supplement 1, year 2025, page 1808

Session: Other topics (Publication Only)

version:	1.02