fetching data ...

ABS0163 (2025)
PREDICTION OF RHEUMATOLOGICAL DISEASES USING MACHINE LEARNING MODELS AND NATURAL LANGUAGE PROCESSING TECHNIQUES ON PHYSICIAN NOTES
Keywords: Epidemiology, Outcome measures, Artificial Intelligence
A. C. Genç1,3, F. Turkoglu1, E. Bilgin2, E. Gönüllü2
1Sakarya Training and Research Hospital, Department of Internal Medicine, Sakarya, Türkiye
2Sakarya University Faculty of Medicine, Division of Rheumatology, Sakarya, Türkiye
3Bahçeşehir University Graduate School of Artificial Intelligence (Interdisciplinary), Istanbul, Türkiye

Background: Artificial intelligence (AI) has rapidly advanced in recent years, encompassing broad domains such as machine learning (ML) and natural language processing (NLP). NLP is a subfield of AI focused on enabling computers to understand, interpret, and process human language, both spoken and written. It has found applications in various healthcare settings, including emergency call centers, medical imaging, and hospital patient support systems. In rheumatological diseases, patient history and physical examination play a critical role in diagnosis and treatment.


Objectives: This study aims to predict the specific rheumatological disease a patient may have by leveraging ML and NLP techniques on physicians’ notes, which include patient history, physical examination findings, and clinical follow-up records written during outpatient clinic visits.


Methods: A total of 4167 patients with rheumatological diagnoses were included in the study. Among the diagnoses, the conditions with the highest number of patients included rheumatoid arthritis, lupus, spondyloarthritis, Sjögren’s syndrome, psoriatic arthritis, fibromyalgia, and other diseases. Physician notes were preprocessed using an unsupervised learning model, where the “nominal to string” transformation was applied, followed by text processing with the “string to word vector” method. Decision tree, random forest, logistic regression binary classifier, and Naive Bayes algorithms were implemented, and the results of the model with the best performance are presented.


Results: The ML model that provided the best performance metrics was the binary logistic regression classifier. The model (average weights) achieved a receiver operating characteristic (ROC) area of 0.966, a sensitivity of 0.784, a specificity of 0.800, a recall of 0.784, and a precision-recall curve area of 0.829.


Conclusion: AI technologies are expected to continue spreading across all areas of medicine, providing clinicians with significant time savings. In the near future, their integration into hospital systems is anticipated to enhance the quality of professional healthcare services.


REFERENCES: NIL.


Acknowledgements: I thank Yusuf Kerem Yücel for setting up the computational environment.


Disclosure of Interests: None declared.

© The Authors 2025. This abstract is an open access article published in Annals of Rheumatic Diseases under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ ). Neither EULAR nor the publisher make any representation as to the accuracy of the content. The authors are solely responsible for the content in their abstract including accuracy of the facts, statements, results, conclusion, citing resources etc.


DOI: annrheumdis-2025-eular.A2027
Keywords: Epidemiology, Outcome measures, Artificial Intelligence
Citation: , volume 84, supplement 1, year 2025, page 1508
Session: Across diseases (Publication Only)