fetching data ...

POS0678 (2025)
A MACHINE LEARNING MODEL FOR THE EARLY IDENTIFICATION OF RHEUMATOID ARTHRITIS: DEVELOPMENT AND VALIDATION
Keywords: Descriptive Studies, Artificial Intelligence, Validation, Epidemiology
M. Dreyfuss1, Y. Jenudi1, D. Riesel1, O. Ramni1, D. Underberger1, B. Getz1, S. Steinberg-Koch1, D. White3, E. Myasoedova2
1Predicta Med, Ramat Gan, Israel
2Mayo Clinic, Division of Rheumatology, Department of Internal Medicine, Rochester, MN, United States of America
3Articularis Healthcare Group, Atlanta, GA, United States of America

Background: Patients with rheumatoid arthritis often experience clinically significant delays in diagnosis (Sørensen et al., 2015; Raza et al., 2011). RA can present similarly to other types of inflammatory arthritis, and can therefore be challenging for a primary care physician (PCP) to recognize (Saraiva et al., 2023). Indeed, the longest delay is after initial presentation to the PCP till evaluation by rheumatology (Barhamian et al., 2017; Stack et al., 2019). Digital tools such as machine learning algorithms have the potential to help physicians identify patients with undiagnosed RA earlier in the course of disease.


Objectives: In this study we describe the development and validation of a novel machine learning model for identifying patients in the community who may be at risk of having undiagnosed RA.


Methods: Patients from the community population (n=395,918 patients) at Mayo Clinic between 2012 and 2022 were split into training and validation sets. Cases with RA and controls with no evidence of RA were identified in both sets. Prediction dates for model training and evaluation were set at six month intervals on the 1st of January and July of each year. Cases were assigned to the prediction date directly preceding autoantibody testing before their first diagnosis of RA. This design was chosen to expose the model to information preceding clinical suspicion for disease. Controls were randomly assigned to prediction dates based on data eligibility. A gradient boosted trees algorithm was trained using electronic medical record (EMR) data documented during the two years prior to each patient’s prediction date. Input features included information from the structured data (age, sex, diagnosis codes, medication prescriptions and laboratory results), and symptoms and signs that were extracted from clinical notes by natural language processing (NLP). The model was then evaluated on the validation set, and area under the curve (AUC) was used to assess the model’s ability to discriminate between new cases of RA and controls.


Results: The validation set included 145 patients with RA (108 females; mean age 55.7, standard deviation [SD] 16.3) and 17,702 control patients (9,758 females; mean age 49.3, SD 17.0). The AUC on the validation set was 77.9% (Figure 1). Symptoms and signs documented in clinical notes and diagnosis codes were important predictive features, including arthritis, pain and swelling in various joints, enthesopathies and synovitis (Figure 2.). Additional contributing features included elevated inflammatory markers and glucocorticoid and NSAID use.


Conclusion: The model displayed good performance in its ability to discriminate between cases of RA and controls. Implementation of the model may help PCPs identify undiagnosed RA in the primary care population using existing data from the EMR. Improving time to diagnosis could help patients receive treatment and reduce downstream sequelae from untreated disease. Features from structured data and unstructured data contributed to model performance. The important contribution of features extracted by NLP from clinical documents suggests that further improvements in model performance may come from refined NLP techniques.


REFERENCES: [1] Sørensen J, Hetland ML; all departments of rheumatology in Denmark. Diagnostic delay in patients with rheumatoid arthritis, psoriatic arthritis and ankylosing spondylitis: results from the Danish nationwide DANBIO registry. Ann Rheum Dis . 2015;74(3):e12. doi:10.1136/annrheumdis-2013-204867.

[2] Raza K, Stack R, Kumar K, Filer A, Detert J, Bastian H, Burmester GR, Sidiropoulos P, Kteniadaki E, Repa A, Saxne T, Turesson C, Mann H, Vencovsky J, Catrina A, Chatzidionysiou A, Hensvold A, Rantapää-Dahlqvist S, Binder A, Machold K, Kwiatkowska B, Ciurea A, Tamborrini G, Kyburz D, Buckley CD. Delays in assessment of patients with rheumatoid arthritis: variations across Europe. Ann Rheum Dis. 2011 Oct;70(10):1822-5. doi: 10.1136/ard.2011.151902. Epub 2011 Aug 7. Erratum in: Ann Rheum Dis. 2016 Jul;75(7):1422. doi: 10.1136/ard.2011.151902corr1. PMID: 21821867.

[3] Saraiva L, Duarte C. Barriers to the Diagnosis of Early Inflammatory Arthritis: A Literature Review. Open Access Rheumatol. 2023 Jan 27;15:11-22. doi: 10.2147/OARRR.S282622. PMID: 36733437; PMCID: PMC9888401.

[4] Barhamain AS, Magliah RF, Shaheen MH, Munassar SF, Falemban AM, Alshareef MM, Almoallim HM. The journey of rheumatoid arthritis patients: a review of reported lag times from the onset of symptoms. Open Access Rheumatol. 2017 Jul 28;9:139-150. doi: 10.2147/OARRR.S138830. PMID: 28814904; PMCID: PMC5546831.

[5] Stack RJ, Nightingale P, Jinks C, Shaw K, Herron-Marx S, Horne R, Deighton C, Kiely P, Mallen C, Raza K; DELAY study syndicate. Delays between the onset of symptoms and first rheumatology consultation in patients with rheumatoid arthritis in the UK: an observational study. BMJ Open. 2019 Mar 4;9(3):e024361. doi: 10.1136/bmjopen-2018-024361. PMID: 30837252; PMCID: PMC6429945.

Receiver operating characteristics curve displaying the discriminative ability of the model over all thresholds.

SHAP plot displaying the top 25 contributing features to the model. Features describing clinical signs and symptoms are extracted from clinical texts unless explicitly indicated that they are from diagnosis codes (‘dx’). Female is coded as 1 for sex.


Acknowledgements: NIL.


Disclosure of Interests: Michael Dreyfuss I am Medical Director at Predicta Med Ltd, I am Medical Director at Predicta Med Ltd, Yonatan Jenudi Data scientist at Predicta Med., Data scientist at Predicta Med., Dan Riesel Senior data scientist with options at Predicta Med., Senior data scientist currently at Predicta Med., Or Ramni Data scientist with options at Predicta Med., Data scientist currently employed at Predicta Med., Daniel Underberger Chief Medical Officer with options currently at Predicta Med, Chief Medical Officer currently at Predicta Med, Benjamin Getz CTO with options at Predicta Med, Currently CTO at Predicta Med, Shlomit Steinberg-Koch CEO with shares of Predicta Med, Current CEO of Predicta Med, Douglas White Has options in Predicta Med, Elena Myasoedova: None declared.

© The Authors 2025. This abstract is an open access article published in Annals of Rheumatic Diseases under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ ). Neither EULAR nor the publisher make any representation as to the accuracy of the content. The authors are solely responsible for the content in their abstract including accuracy of the facts, statements, results, conclusion, citing resources etc.


DOI: annrheumdis-2025-eular.B1136
Keywords: Descriptive Studies, Artificial Intelligence, Validation, Epidemiology
Citation: , volume 84, supplement 1, year 2025, page 859
Session: Poster View I (Poster View)