
Background: Patients with Raynaud’s phenomenon (RP) fulfilling the Very Early Diagnosis of Systemic Sclerosis (VEDOSS) criteria represent a clinically heterogeneous population, with highly variable risk and timing of progression to definite systemic sclerosis (SSc). Although the VEDOSS approach improved early identification of patients at risk, current prognostic models still rely mainly on individual clinical or serological features and fail to capture the complex, multidimensional interactions between demographic characteristics, immunological profiles, inflammatory burden, and early subclinical organ involvement. As a consequence, accurate early risk stratification remains challenging. Unsupervised machine learning (ML) techniques offer a data-driven approach capable of integrating high-dimensional clinical data and identifying latent phenotypic patterns without predefined assumptions.
Objectives: To identify distinct clinical phenotypes among VEDOSS patients using unsupervised ML techniques and to evaluate differences in risk and timing of progression from RP to definite SSc.
Methods: VEDOSS patients fulfilling 2011 VEDOSS criteria were identified from the European Scleroderma Trials and Research (EUSTAR) database (CP121). Demographic variables, clinical manifestations, immunological profile, laboratory parameters, cardiovascular assessment, and pulmonary function tests were included in the analysis. After data preprocessing and imputation of missing values, dimensionality reduction was performed using Principal Component Analysis (PCA). Unsupervised clustering was subsequently conducted using the K-means algorithm, with the optimal number of clusters determined by the elbow method. Progression to definite SSc was defined according to the 2013 ACR–EULAR classification criteria. Time-to-event analyses were used to compare cumulative incidence of progression and SSc-free survival among clusters.
Results: A total of 238 VEDOSS patients were evaluated (94.5% female; mean age 49.5 years). Unsupervised ML identified three distinct clusters with significantly different demographic, clinical, immunological, and prognostic profiles (Table 1). Cluster 1 (n=117) comprised younger patients with earlier RP onset, minimal clinical and subclinical organ involvement, low inflammatory markers, and a low prevalence of SSc-specific autoantibodies. This cluster showed the lowest risk of progression to definite SSc (21.4%) and the longest SSc-free survival. Cluster 2 (n=33) was characterized by intermediate age at RP onset, prominent vasculopathic and cutaneous features (including puffy fingers and telangiectasias), and the highest prevalence of anti-centromere antibodies. Patients in this cluster showed an intermediate risk of progression (39.4%) and a relatively indolent disease course. Cluster 3 (n=88) included older patients with later RP onset, higher inflammatory burden, early cardiopulmonary and gastrointestinal involvement, higher prevalence of anti-topoisomerase I antibodies, and evidence of subclinical organ dysfunction. This cluster exhibited the highest risk of progression to definite SSc (58.0%) and a significantly shorter time to classification. Overall, 37.4% of patients progressed to definite SSc within 5 years. Time-to-progression differed significantly across clusters, with Cluster 1 showing the most favorable prognosis and Cluster 3 the most aggressive disease trajectory (Figure 1).
Conclusions: Machine learning–driven clustering identifies three clinically meaningful phenotypes within the VEDOSS population, each associated with distinct risks and timings of progression to definite SSc. ML-based phenotypic stratification may improve early prognostic accuracy, enabling personalized monitoring intensity and risk-adapted therapeutic strategies in very early SSc. These findings support the concept of multiple early disease trajectories in SSc and highlight the potential role of machine learning approaches in precision medicine.
Table 1. Clinical, immunological and instrumental phenotypes of the clusters.
Cumulative incidence of progression from VEDOSS to definite systemic sclerosis according to machine learning–derived cluster assignment.
REFERENCES: NIL.
Acknowledgments: NIL.
Disclosure of Interests: None declared.