Background: Patients with rheumatoid arthritis (RA) are at increased cardiovascular (CV) risk. It is widely known that traditional CV risk factors and the available 10-year CV risk estimation models may not intercept correctly those RA patients at high risk, mainly due to the inflammatory burden of disease active [1]. For this reason, the phenotype of patients most likely to experience major cardiovascular events (MACE) is difficult to represent. The CORDIS-SIR study group identified a multicenter longitudinal cohort of RA patients to study the interaction between immune inflammation and CV outcomes [2]. Unsupervised machine learning (uML) algorithms could help physicians identify patients with higher CV risk.
Objectives: To investigate whether an uML clustering analysis can identify the RA patient’s phenotype with different frequencies of MACE in a large multicenter Italian RA cohort.
Methods: A cross-sectional cohort of RA patients fulfilling the 2010 ACR/EULAR classification criteria, free from previous CV events, was enrolled in January 2019 and then longitudinally observed for new onset of MACE until December 2023. Clinical and demographic attributes were recorded, including traditional CV risk factors and the SCORE-2 algorithm [3], and MACEs that occurred were tracked. Factor Analysis with Mixed Data (FAMD), a Principal Component Analysis generalization that can be deployed on large datasets with qualitative and quantitative variables, was implemented in a Python ver.3.9 environment using Prince ver. 0.7.1. Accurate preprocessing with feature standardization and autocorrelation check was carried out for continuous variables. Observations with missing data were ablated. The number of FAMD components to retain for clustering was assessed on bootstrapped eigenvalues distribution. Hierarchical Clustering on Principal Components was performed to define the number of clusters using Ward’s criterion and an Euclidian distance metric on the selected FAMD principal components. A supervised extreme gradient boosting (XGBoost) algorithm was run with cluster labels as the dependent variable to assess feature importance for cluster categorization. ANOVA and Chi-square were then used to assess differences in the continuous and categorical variables distribution between clusters.
Results: The entire analysis without missing data included 951 RA patients with a mean ±SD age at baseline of 61.8 ±10.5 years, female 771/951 (81.1%) and median (95%CI) disease duration of 120 (102-126) months. Detailed information about our cohort has been reported in Table 1. Hierarchical clustering on two principal components retrieved n.4 clusters whose characteristics have been summarized in Table 1 and Figure 1A. Age, disease duration, Triglycerides serum level, and SCORE-2 were the most important for clustering as in Figure 1B. Among the four clusters: Cluster 1 had the highest HAQ levels, a higher proportion of patients on steroids and higher total cholesterol values. Cluster 2 had lower disease activity, BMI, diabetes, hypertension and SCORE-2; patients in these clusters did not experience any MACE. Cluster 3 had a higher prevalence of male patients, with disease <10 years and on csDMARD therapy. Cluster 4 had a higher disease duration, systolic blood pressure, serum triglyceride level and proportion treated with hypolipidising agents and TNFi; these patients presented with more MACE.
Conclusion: Our uML-derived clustering identified specific clinical characteristics of RA patients to consider together with the SCORE-2 chart. The portrayed disease phenotypes can corroborate CV risk models in recognizing those more likely to experience a MACE.
REFERENCES: [1] Choy E et al. Rheumatology (Oxford) 2014; 53(12),2143-54
[2] Cacciapaglia F et al. Eur J Intern Med 2022; 96:60-5
[3] SCORE-2 Working Group and ESC CV risk collaboration. Eur Heart J 2021;42:2439-54
Table 1.
Acknowledgements: NIL.
Disclosure of Interests: None declared.