Background: Systemic lupus erythematosus, characterized by a severe autoimmune disease with strong individual heterogeneity, is particularly important to study the risk factors that affect its prognosis in order to determine patient outcome and treatment. Currently, numerous studies have proven that EBV infection can affect the onset and prognosis of SLE. Nevertheless, due to the diverse clinical manifestations of SLE and the complex infection status of EBV, it is difficult to predict the prognostic risk factors of patients with systemic lupus erythematosus and EBV infection.
Objectives: To predict adverse prognostic factors in patients with systemic lupus erythematosus (SLE) combined with Epstein-Barr virus (EBV) infection, the dataset was divided into a 70-30 ratio for training and testing. The training set was used to train the model, and subsequently, the testing set was used to evaluate the model, thus determining the hyperparameters of a machine learning model with high testing accuracy and low bias. We assessed the effectiveness of seven machine learning models: LR (Logistic Regression), SVM (Support Vector Machine), GNB (Gaussian Naive Bayesian), RF (Random Forest), GBM (Gradient Boosting Machine), ANN (Artificial Neural Network), ADA (AdaBoost), and the SHAP model in predicting factors that influence the prognosis of SLE combined with EBV infection.
Methods: A retrospective analysis was conducted on 217 patients with SLE combined with EBV infection admitted to the Rheumatology and Immunology Department of the First Affiliated Hospital of Wenzhou Medical University between January 2010 and October 2022. After excluding missing data, we analyzed the data of 217 patients (187 with a favorable prognosis and 30 with an adverse prognosis). T-tests and chi-square tests were used to compare patients with favorable and adverse prognoses to identify risk factors. The Recursive Feature Elimination with Cross-Validation (RFEcv) method was employed to further filter variables with p < 0.05 from univariate analysis. The variable subset with the highest AUC entered into machine learning modeling. Predictive factors were divided into laboratory indicators and clinical features, and the above machine learning models were used to analyze factors influencing adverse prognosis in SLE patients with EBV infection. Finally, the SHAP model was used to create a scatter plot, identifying variables with a greater impact on the model’s predictive results.
Results: In the analysis of machine learning models predicting laboratory indicator factors, the SVM model demonstrated higher accuracy (72.7%) and F-value (0.308). In the analysis of clinical sign predictive factors, ANN exhibited the highest accuracy (87.9%) and F-value (0.429). Based on the mentioned machine learning models, the SHAP model identified a total of 13 adverse prognostic factors in SLE patients with EBV infection, including 5 laboratory indicators (ALT, CD3+T cell count, C3, VCA IgM, Proteinuria) and 8 clinical features (SLEDAI-2K, Baseline steroid dosage, Fever, Lupus nephritis, Rash, Lupus nephritis class IV, Non-standard treatment, Arthritis).
Conclusion: ANN and SVM models are two machine learning models that performed well, identifying a total of 13 adverse prognostic factors in SLE patients with EBV infection. Therefore, machine learning models have potential in predicting the prognosis of SLE combined with EBV infection.
REFERENCES: NIL.
Acknowledgements: NIL.
Disclosure of Interests: None declared.