
Background: Giant cell arteritis (GCA) and polymyalgia rheumatica (PMR) are systemic inflammatory diseases that may occur asynchronously or concurrently, affecting adults aged over 50 years. Nearly 16%–21% of patients with PMR are diagnosed with GCA before, during or after PMR diagnosis [1]. Early diagnosis and treatment of GCA are pivotal, as diagnostic delay may lead to ischaemic complications including visual loss or stroke. Given that patients with PMR develop GCA at higher rates than those of the general population and that early detection of GCA is important to prevent the development of life-altering symptoms, there is interest in determining which patients with PMR will later develop GCA.
Objectives: We aimed to use electronic health records to develop a machine learning model capable of predicting which patients with PMR would later develop GCA.
Methods: A longitudinal cohort study, using existing electronic medical records from the US (Optum), was conducted during 2017-2024 to identify patients with incident PMR who then developed GCA. The machine learning methods used in this study were logistic regression (including regularised logistic regression), k-nearest neighbours, naïve Bayes, partial least-squares discriminant analysis, random forests, XGBoost and deep neural networks. Due to the numeric imbalance between groups (i.e. patients developing GCA vs those not developing GCA), appropriate adjustments were made to ensure that the models were developed correctly, such as the use of F1 score as the performance metric and oversampling of training data. Explanatory variables included patient characteristics, laboratory test results, disease history and symptoms before PMR diagnosis. The response variable was whether a patient was later observed to develop GCA in the post-index period. The data were split into a training set (80%) and a test set (20%) in a stratified manner to ensure equal proportions of GCA development in both data sets. Each model was trained and validated using five-fold cross-validation, and the best model on the validation data was tested on the unseen test data to produce the final model results. This was to ensure that the results indicated the model’s performance on new data. A post hoc analysis was performed using SHapley Additive exPlanations (SHAP) to determine which variables contributed most to the final predictions.
Results: A total of 309 patients with incident PMR who later developed GCA were identified. An XGBoost [2] model achieved the best performance. This model displayed the ability to stratify patients into elevated-risk and reduced-risk cohorts for subsequent GCA development. Moreover, 12.9% of the study population was classified as patients with an elevated-risk and those patients showed a trend toward higher odds of GCA development compared to the reduced-risk group (odds ratio: 1.84; 95% CI: 0.98–3.45). SHAP analysis revealed that patients’ erythrocyte sedimentation rate (ESR), C-reactive protein (CRP) level, sex and white blood cell count were the four variables that contributed most to the model’s predictions. This output suggests that a patient’s ESR level at PMR diagnosis is the most important predictor for determining whether they will later develop GCA (Figure 1).
Conclusions: Although the current model does not yet achieve the performance required for real-world implementation in routine care, our findings demonstrate its potential as a proof of concept. Future research is needed to build on the learnings from this work by extending the research to include additional explanatory variables. Predicting the risk of patients with PMR to develop GCA could be extremely valuable to enhance clinical monitoring of high-risk patients.
SHAP feature ranking of the final model, with features ranked in the descending order of importance. High values indicate that the variable was important for distinguishing the patients who later developed GCA from those who did not
REFERENCES: [1] Salvarani C, et al. Autoimmun Rev. 2024;23(1):103415.
[2] Chen T and Guestrin C. Cornell University (2016).
Acknowledgments: NIL.
Disclosure of Interests: Marco Clery employee of Novartis, Padraic Ward shareholder of Novartis, employee of Novartis, Michelle Carey: None declared, Valeria Jordan Mondragon shareholder of Novartis, employee of Novartis, Julie Le Moal Mouchet shareholder of Novartis, employee of Novartis.