
Background: Scleroderma (SSc) is a systemic disease characterized by progressive fibrosis and a high disease burden of morbidity and mortality. Diagnosis is commonly postponed, mainly due to delayed referral, since primary care physicians and general internists are often unfamiliar with its typical signs and symptoms, individuals with SSc are commonly diagnosed only after symptoms have been present for one year or longer.
Objectives: Our aim is to: 1- examine the ability of AI systems to identify SSc related facial features and 2- to evaluate whether application of an AI algorithm could provide a practical way to aid non-rheumatologists to make timely decisions for early referral to start therapy.
The current study builds on the previous pilot study, where 60 SSc images collected on the internet had been compared to a group of age and sex matched normal faces using VGG-16 (an AI algorithm)
Methods: We recruited participants from 5 SSc centers, in the U.S. (2), Egypt, Canada, and Czech Republic. The participants were categorized into 3 groups: SSc patients (Group A, n=162), rheumatic non-SSc patient controls (Group B, n=118) and non-rheumatic healthy controls (Group C, n=239). We started by taking 1 face photo per participant; it was later standardized to 3 face photos per participant to capture more natural image variations. In total, we collected 285, 153 and 308 images for the 3 groups. Of those 48, 27, and 62 images from the respective groups were pseudorandomly separated out as the test set. The remaining images were used as the training set. Table 1 and Table 2 describe the participants and image dataset in more detail. We defined a mechanism to handle group size imbalances since the groups were of different sizes. We employed state-of-the-art AI models and performed standardized pre-processing and augmentation techniques of the images to compensate for variations in image resolution, positioning, background, coloring and lighting, and to stabilize the training process. Raw images were normalized to a standard size (512×683 pixels) in an upright orientation. While F1-score (a single metric that balances precision and recall to avoid both false positives and false negatives) was the primary performance metric for selecting the best models, we tracked accuracy, sensitivity and specificity to have a more complete and more easily understandable picture of model performance.
Results: The AI models: ConvNeXt-V2 (a state-of-the-art Convolutional Neural Network model) and EVA-02 (a state-of-the-art Vision Transformer model) were used in our cohort as they represent the two dominant computer vision architectures widely used today. These two models achieved an accuracy of 80-85% (ConvNeXt-V2) and 85-90% (EVA-02), respectively, a sensitivity of 75-90% and 80-95%, and specificity of 75-90% and 80-95%, respectively. Figure 1 shows a few sample training sessions demonstrating the convergence of accuracy, sensitivity and specificity. Each training session typically required fewer than 20 complete runs of the training data set (epochs). Figure 2 shows a facial image overlaid with heatmaps for inference. It demonstrates that the trained model is concentrating on the regions and features deemed most influential for the model’s final classification results
Conclusions: Our models have shown promising results with the early data sets. more data is warranted to further validate the results. Since the protocol also collects hand images, further evaluation of combining the face and hand images may improve the accuracy of the models.
the face image is AI generated, not a real patient
REFERENCES: NIL.
Acknowledgments: NIL.
Disclosure of Interests: Yossra A. Suliman Johnson and Johnson,
Boehringer Ingelheim, Jerry Wu: None declared, Michal Tomcik: None declared, John Antowan: None declared, Thomas Winkler: None declared, Esraa Talaat: None declared, Amyn Rajan: None declared, Daniel Furst: None declared.