Background: Large Language Models (LLMs) show increasing promise in medicine, with recent validation demonstrating strong correlation with expert scoring of the Myositis Disease Activity Assessment Tool (MDAAT). However, their performance in cutaneous assessment remains unexplored, particularly for rare diseases like Dermatomyositis (DM).
Objectives: This study evaluated the LLM “Claude v. 3.5 Sonnet” in scoring cutaneous manifestations of DM against expert assessors, with implications for automated clinical trial screening where competing trials often limit recruitment from an already restricted patient pool.
Methods: Twenty-seven DM cases with standardized clinical photographs were identified through systematic PubMed review. Two rheumatologists with expertise in Cutaneous Dermatomyositis Disease Area and Severity Index (CDASI) scoring and trial recruitment independently assessed the images. The LLM ‘Claude’ analysed identical images using chain-of-thought prompting to score CDASI domains: erythema, scale, erosion/ulceration, poikiloderma, and calcinosis. Hand lesions were scored with specific attention to papules (requiring doubled erythema scores) and periungual changes. Intraclass Correlation Coefficient (ICC) analysis was performed using two-way random effects modelling (Stata 18).
Results: Global ICC analysis demonstrated excellent agreement between Claude and expert assessors (0.92, 95% CI: 0.89-0.94), comparable to inter-expert reliability (0.87, 95% CI: 0.82-0.91). Domain-specific analysis revealed:
Moderate agreement for core features:
Erythema (0.61, 95% CI: 0.27-0.81).
Scaling (0.57, 95% CI: 0.19-0.79).
Erosions (0.57, 95% CI: 0.21-0.79).
Poikiloderma (0.47, 95% CI: 0.10-0.73).
Strong concordance for hand assessment-– an ubiquitous and specific feature of disease:
Global hand score (0.95, 95% CI: 0.91-0.97).
Hand erythema (0.78, 95% CI: 0.24-0.95).
Perfect agreement for periungual vasculitis (ICC 1.0).
Lower reliability for damage assessment:
Hand damage (0.37, 95% CI: 0.10-0.85).
Time Efficiency Analysis:
Expert assessors: Mean 8.4 minutes per case (range 6-12 minutes).
LLM assessment: Mean 42 seconds per case (range 35-50 seconds).
Total time saved: 93% reduction in scoring time.
Additional efficiency: Simultaneous batch processing capability for LLM versus sequential expert assessment.
Conclusion: The LLM demonstrates excellent reliability for global disease assessment (ICC 0.92) and objective features like periungual changes (ICC 1.0), with significant time efficiency (93% reduction in scoring time) and batch processing capabilities that could enhance clinical trial recruitment workflows. However, important limitations persist in assessing subtle features (poikiloderma ICC 0.47, damage ICC 0.37) and technical constraints including image quality dependencies, suggesting its current optimal use is as a screening tool to support, rather than replace, expert assessment.
REFERENCES: [1] Vincenzo Venerito, Marco Fornaro, Sara Sabbagh, et al. Integrating large language models in medicine: a study of Claude 2’s performance in MDAAT scoring for idiopathic inflammatory myopathies, Rheumatology , Volume 63, Issue 10, October 2024, Pages e292–e293.
Flowchart of Analysis Process
Intraclass Correlation Coefficient of CDASI in 27 Dermatomy ositis cases
Domain | ICC | 95% CI |
---|---|---|
Global (Experts 1 vs 2 ) | 0.87 | 0.82–0.91 |
Global (Claude vs Expert 1 ) | 0.92 | 0.89–0.94 |
Global (Claude vs Expert 2 ) | 0.87 | 0.82–0.91 |
Erythema | 0.61 | 0.27–0.81 |
Scaling | 0.57 | 0.19–0.79 |
Erosions | 0.57 | 0.21–0.79 |
Poikiloderma | 0.47 | 0.10–0.73 |
Calcinosis | Not Detected | N/A |
Hands (Global ) | 0.95 | 0.91–0.97 |
Hands (Erythema ) | 0.78 | 0.24–0.95 |
Hands (Ulcers ) | 0.65 | 0.10–0.92 |
Hands (Damage ) | 0.37 | 0.10–0.85 |
Hands (Periungual Vasculitis ) | 1.0 | 1.0–1.0 |
Acknowledgements: Lisa Traboco, Shounak Ghosh, Meera Shahs, VIJ Pallavi on behalf of MyoLLM group.
Disclosure of Interests: None declared.
© The Authors 2025. This abstract is an open access article published in Annals of Rheumatic Diseases under the CC BY-NC-ND license (