Background: Large language models (LLMs) demonstrate potential for automating clinical tasks in rheumatology. While initial applications show promise in disease activity scoring and computer-assisted diagnosis, their utility in structured clinical decision support for osteoporosis management remains inadequately evaluated. Current challenges in osteoporosis care include efficient risk stratification and appropriate specialist referral timing, areas where automated support could enhance care delivery.
Objectives: To evaluate the effectiveness of large language models (Claude, Anthropic) and prompt engineering techniques in delivering osteoporosis care guidance through clinical case based scenarios to assess the model’s capability in risk stratification, treatment recommendations, and referral decisions in accordance with the National Osteoporosis Guidelines Group (NOGG) UK guidelines and assess the validity of these AI-generated recommendations through comparative analysis with clinical decisions made by osteoporosis care specialists.
Methods: We evaluated the performance of Claude (Anthropic) in analysing 50 clinician-developed fictional osteoporosis cases. The LLM was configured with the National Osteoporosis Guideline Group (NOGG) framework through prompt engineering for risk stratification and treatment recommendations. Cases were independently assessed by two osteoporosis specialists (>5 years’ experience). Primary outcomes included accuracy of risk stratification (low, high, and very high risk), treatment concordance, and referral decision recommendations. Risk categories were defined as: low/high risk (manageable in primary care with lifestyle modifications or oral bisphosphonates) and very high risk (requiring specialist assessment). Performance metrics included precision, recall, and time efficiency measurements. We additionally evaluated the system’s performance across varying case complexities, including challenging scenarios such as breast cancer patients on hormonal therapy.
Results: The LLM demonstrated 90% accuracy in risk stratification compared to specialist assessment, with balanced precision and recall across risk categories (low/high risk: precision 0.92, recall 0.88; very high risk: precision 0.89, recall 0.91). Treatment recommendation concordance was 44% overall, improving to 60% for primary care cases, reflecting better alignment in less complex scenarios. Referral decision concordance reached 90%, suggesting reliable triage capabilities. Time efficiency analysis showed marked differences: specialists required 10-12 minutes per case (approximately 24 hours total including administrative tasks), while the LLM completed all assessments in 7 minutes. Detailed analysis revealed consistent performance across varying case complexities, with particularly strong correlation in identifying high-risk cases requiring immediate intervention. However, performance varied in scenarios lacking specific NOGG guidelines, such as breast cancer patients on hormonal therapy, where treatment decisions rely heavily on clinical judgment. Key performance metrics demonstrated:
High accuracy and reliability in risk stratification across complexity levels.
Strong correlation with specialist decisions in urgent intervention cases.
Reduced concordance in therapeutic decisions for complex cases.
Significant time efficiency improvements in initial assessment.
Consistent application of NOGG guidelines where available, with areas requiring refinement in nuanced therapeutic decision-making.
Conclusion: LLMs demonstrate promising utility for osteoporosis risk stratification and referral triage, potentially reducing administrative burden while maintaining clinical accuracy. However, lower concordance in treatment recommendations highlights the continuing necessity of clinical expertise for therapeutic decision-making. Further validation studies are needed to evaluate real-world implementation, workflow integration, and cost-effectiveness.
REFERENCES: NIL.
Acknowledgements: NIL.
Disclosure of Interests: None declared.
© The Authors 2025. This abstract is an open access article published in Annals of Rheumatic Diseases under the CC BY-NC-ND license (