Background: Accurate quantification of hand/feet radiographic erosion and joint space narrowing (JSN) scores are crucial for evaluating structural damage in psoriatic arthritis (PsA) clinical trials. Currently, the scoring process involves the manual scoring of joints by appropriately trained experts using the PsA-modified van der Heijde Sharp (vdH-S) scoring system.
Objectives: The manual scoring approach is time-intensive, costly, and subject to variability, typically requiring 2-3 readers reviewing the same image to ensure valid results. To address these challenges, we developed an AI algorithm to localize individual joints-of-interest and subsequently assign structural damage scores.
Methods: A total of 821 patients from three completed PsA clinical trials had 8-12 X-rays each from 2-3 visits over a 24-week period. Patients and their images were divided into training (271) validation (72), and holdout (478) imaging datasets.
The overall algorithm pipeline consisted of two sub-models: a joint detection (JD) model and a scoring (SC) model. To build and optimize the sub-models, all images from the training and validation sets (3018 X-rays of hands and feet), and a portion of the holdout set (580 X-rays of hands and feet from 113 patients) were annotated using bounding boxes around the joints scored in vdH-S. Additionally, vdH-S scores and its subscores from the trials (two radiologists and an adjudicator) were used as ground truth for the SC models.
The JD models were trained, employing YOLO-V8 architecture, to automatically detect the hand and foot joints. Independently, Densenet121 convolutional neural network models were trained to generate erosion and JSN scores from each joint (SC model). The respective models were selected and locked (based on least bounding box regression and classification error and mean squared error (MSE) in the validation set) and evaluated in the annotated holdout set . Performance of JD models was assessed via average Intersection over union (IoU) and proportion of joints with IoU>=0.5.
Lastly, to test the performance of the overall pipeline (Figure 1A), the complete holdout set was passed through the locked JD models and the detected joints were subsequently scored and aggregated to extract the final vdH-S scores per patient. The performance was evaluated at the joint-, extremity- (hands or feet), and patient-level (hands and feet) using intra-class correlation (ICC) and MSE, with panel score as the ground truth. To improve explainability a heatmap was generated using Grad-CAM for visualization.
Results: In the annotated holdout set , the locked JD models demonstrated strong performance with an average IoU of 0.88±0.02 (96.26% of joints IoU ≥ 0.5) and 0.76±0.04 (96.25% of joints IoU ≥ 0.5) for the foot and hand joints, respectively. Similarly, in the annotated holdout set , the locked erosion and JSN SC models demonstrated a good agreement with ground truth at the joint-level (ICC = 0.85 and ICC = 0.97), and were comparable to the agreement between human readers (ICC = 0.88 and ICC = 0.98).
In the complete holdout set , the overall pipeline generated scores that demonstrated a strong agreement with the ground truth at the joint- (Erosion ICC = 0.85 and JSN ICC = 0.96), extremity-(Hands ICC = 0.93 and Feet ICC = 0.88), and patient-level (ICC = 0.92), and were comparable to the agreement between human readers, among joints identified by the model and scored by both the model and human readers (Figure 1 and Figure 2).
Conclusion: The joint detection and score outputs of the developed pipeline demonstrated state-of-the-art performance and promise of automating the assessment of radiographic structural damage. Future work will include developing a more quantitative measure of structural damage and incorporating more radiographs with severe joint damage into the training set to further improve joint detection and scoring accuracy.
Overall pipeline. (A) Schematic diagram with heatmaps and (B) comparison of AI-generated and human reader scores.
Quantitative comparison of AI-generated and human reader scores. ICC = intraclass correlation coefficient; MSE = mean squared error.
REFERENCES: NIL.
Acknowledgements: NIL.
Disclosure of Interests: Robert Janiczek may own stock or stock options in J&J, employee of J&J, Darshana Govind may own stock or stock options in J&J, employee of J&J, Kenneth Broos may own stock or stock options in J&J, employee of J&J, Nicholas Fountoulakis may own stock or stock options in J&J, employee of J&J, Lenore Noonan may own stock or stock options in J&J, employee of J&J, Shinobu Yamamoto may own stock or stock options in J&J, employee of J&J, Natalia Zemlianskaia may own stock or stock options in J&J, employee of J&J, Craig S. Meyer may own stock or stock options in J&J, employee of J&J, Emily Scherer may own stock or stock options in J&J, employee of J&J, Michael Deman may own stock or stock options in J&J, employee of J&J, Pablo Damasceno may own stock or stock options in J&J, employee of J&J, Philip S. Murphy may own stock or stock options in J&J, employee of J&J, Terence Rooney may own stock or stock options in J&J, employee of J&J, Elizabeth C. Hsia may own stock or stock options in J&J, employee of J&J, Anna Beutler may own stock or stock options in J&J, employee of J&J, previously paid consultant for Kiniksa Pharmaceuticals and Boston Pharmaceuticals, Kristopher Standish may own stock or stock options in J&J, employee of J&J, Stephen S.F. Yip may own stock or stock options in J&J, employee of J&J