fetching data ...

POS0994 (2025)
PREDICTING ONE-YEAR DISEASE ACTIVITY IN SYSTEMIC LUPUS ERYTHEMATOSUS USING MACHINE LEARNING AND EXPLAINABLE AI
Keywords: Real-world evidence, Outcome measures, Artificial Intelligence
A. Ortolan1,2, S. L. Bosello1,2, L. Lilli3, L. Antenucci3, C. Masciocchi3, M. Gorini4, G. Castellino4, L. Petricca1, M. R. Gigante1, V. A. Pacucci1, L. Lanzo1, C. Gavotti1, F. Cristiano1, P. Cerasuolo1, J. Lenkowicz3, A. Cesario3, S. Patarnello3, M. A. D’ Agostino1,2
1Department of Rheumatology, Fondazione Policlinico Universitario Agostino Gemelli IRCSS, Rome, Italy
2Catholic University of the Sacred Heart, Rome, Italy
3Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy
4AstraZeneca Italy, MIND, Milan, Italy

Background: Systemic lupus erythematosus (SLE) is a highly heterogeneous autoimmune disease, making the prediction of disease flare-ups in clinical practice particularly challenging.


Objectives: The aim of this study was to develop a machine learning model capable of predicting risk of disease activity events in SLE patients within a one-year time frame, based on clinical features about the current status and the last 12 months, and the past history.


Methods: A comprehensive dataset, the SLE Data Mart, was created using Data Mining and Natural Language Processing techniques [1, 2]. This dataset includes structured information on SLE patients’ demographics, treatments, clinical manifestations, involved clinical domains, and laboratory data from our University Hospital. The cohort included patients with at least one outpatient visit and one hospitalization episode coded as ICD9 710.0 or 695.4. The outcome, defined as an ‘activity event’ within 12 months, was a binary variable representing the occurrence of one or more of the following: neurological, renal, or vascular symptoms; new involved organ domain; a SLE hospitalization. Input features included variables describing current contact, clinical history, and data from the previous 12 months, expressed as binary or continuous variables. Univariable feature selection was performed using Chi-Square and Mann-Whitney U tests, retaining variables significantly associated with the outcome (p < 0.05). These features were subsequently used for model training and inference. The predictive model was built using a Grid Search algorithm over machine learning models, including Logistic Regression, Support Vector Classifier, Decision Tree, Random Forest, and XGBoost, with a 5-fold cross-validation. The model with the highest AUC was selected. The model’s results were analyzed through explainable artificial intelligence (XAI) techniques to analyze features’ impact on the model prediction. To this scope, the SHAP software was used.


Results: The dataset included 262 SLE patients from our University Hospital, with 5,952 contacts recorded between 2012 and 2020. Patients were predominantly female (88%), with a median baseline age of 43 years, a median follow-up of 6 years, and a median of 16 contacts per patient. Data were split into a training set (70%, 183 patients, 4117 contacts) and a testing set (30%, 79 patients, 1845 contacts). A total of 48% of contacts were characterized with an activity event (as per the above definition). Univariable feature selection identified 125 variables, out of 192 available, significantly linked to SLE activity (p < 0.05). Younger age at contact (median 44 years, p <0.0001) and abnormal laboratory markers, including elevated albuminuria and proteinuria, were strongly associated with occurrence of an activity event within 12 months (Table 1). SHAP analysis identified renal involvement features, including proteinuria (≥0.5 g/24 h) and albuminuria, as the most important predictors of a severe event. AUC of the model was 0.70. Conversely, variables such as ‘past therapy step down’ and ‘number of step-down changes’ were negatively associated with an activity event within 12 months, which was also the case for the absence of hematological or cutaneous involvement at current contact (Figure 1).


Conclusion: Renal disease is a strong determinant for disease activity at 12 months, but many other clinical features (namely hematological and cutaneous involvement) as well as patient history and therapy modifications contribute to determine the risk for an activity event at 12 months.


REFERENCES: [1] Antenucci L, et al. Gemelli SLE Data Mart: a Real World Evidence Framework for the Evolution of Systemic Lupus Erythematousus Patients, accepted for publication.

[2] Lilli, Livia, et al. “A comprehensive natural language processing pipeline for the chronic lupus disease.” Digital Health and Informatics Innovations for Sustainable Health Care Systems. IOS Press, 2024. 909-913.

Feature selection: list of the most representative features of the 125 selected by the algorithm.

Activity Event NRV (1772) No Activity Event NRV (2162) p-value
Current Contact
Age at contact (years) 44 (36, 53) 46 (37, 55) < 0.0001
Outpatient visit 1217 (67%) 1779 (82%) < 0.0001
Delta contacts (days) 77 (33, 136) 90 (32, 153) < 0.05
New involvement 204 (11%) 113 (5%) < 0.0001
Cutaneous involvement 1427 (80%) 1958 (90%) < 0.0001
Hematological involvement 1450 (82%) 1950 (90 %) < 0.0001
C3 consumption 554 (31%) 503 (23%) < 0.0001
Presence of albuminuria 345 (19%) 82 (4%) < 0.0001
Presence of urine blood cells 85 (5%) 39 (2%) < 0.0001
Proteinuria≥0.5 g/24 h 189 (11%) 47 (2%) < 0.0001
Anemia 176 (10%) 47 (2%) < 0.0001
Increased CRP (≥5 mg/L) 296 (18%) 303 (14%) < 0.05
Increased ESR 402 (23%) 353 (16%) < 0.0001
Last 12 months
Total outpatient visits 2 (1, 3) 3 (2, 3) < 0.0001
Outpatient visit 1370 (77%) 1852 (86%) < 0.0001
Day hospital contact 509 (29%) 482 (22%) < 0.0001
Hospitalization 369 (21%) 262 (12%) < 0.0001
New organ involvement 445 (25%) 369 (17%) < 0.0001
Total of new organ involvements 0 (0, 1) 0 (0, 0) < 0.0001
Renal symptom 771 (43%) 529 (24 %) < 0.0001
Vascular symptom 330 (19%) 230 (11%) < 0.0001
C3 consumption 854 (48 %) 863 (40%) < 0.0001
Presence of Albuminuria 520 (29%) 306 (14%) < 0.0001
Proteinuria≥0.5 g/24 h 322 (18%) 165 (8%) < 0.0001
History
Outpatient visit 1526 (86%) 2018 (93%) < 0.0001
Step-down therapy changes 832 (47%) 1397 (65%) < 0.0001
Total of step down-down therapy changes 0 (0, 1) 1 (0, 2) < 0.0001

Legend. For categorical variables, number and percentage are reported, while for the numerical variables, median is shown with first and third quartiles. CRP= C Reactive Protein; ESR= Erythrocyte Sedimentation rate

. Shap Beeswarm plot describing impact and direction of each feature, on the outcome prediction, ordered by importance


Acknowledgements: The project has been developed with the financial contribution of Astra Zeneca.


Disclosure of Interests: Augusta Ortolan Abbvie, Astra Zeneca, Novartis, Janssen, Abbvie, Silvia Laura Bosello Boehringer and Janssen, Boehringer and Janssen, Livia Lilli: None declared, Laura Antenucci: None declared, Carlotta Masciocchi: None declared, Marco Gorini Astra Zeneca, Gabriella Castellino Astra Zeneca, Luca Petricca Astra Zeneca, GlaxoSmithKlein, Maria Rita Gigante Alfasigma, Viviana Antonella Pacucci Alfasigma, Abbvie, Lucia Lanzo: None declared, Cesare Gavotti: None declared, Francesco Cristiano: None declared, Piergiacomo Cerasuolo: None declared, Jacopo Lenkowicz: None declared, Alfredo Cesario: None declared, Stefano Patarnello: None declared, Maria Antonietta D’ Agostino Abbvie, BMS, Amgen, Lilly, UCB, Pfizer, Novartis, Galapagos, MSD, Alfasigma, Abbvie, BMS, Amgen, Lilly, UCB, Pfizer, Novartis, Galapagos, MSD, Alfasigma.

© The Authors 2025. This abstract is an open access article published in Annals of Rheumatic Diseases under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ ). Neither EULAR nor the publisher make any representation as to the accuracy of the content. The authors are solely responsible for the content in their abstract including accuracy of the facts, statements, results, conclusion, citing resources etc.


DOI: annrheumdis-2025-eular.B1040
Keywords: Real-world evidence, Outcome measures, Artificial Intelligence
Citation: , volume 84, supplement 1, year 2025, page 1109
Session: Poster View V (Poster View)