EULAR Abstract Archive

Bookmarked

AB0989 (2024)

CHATGPT4’S PROFICIENCY IN ADDRESSING PATIENTS’ QUESTIONS ON SYSTEMIC LUPUS ERYTHEMATOSUS: A BLINDED COMPARATIVE STUDY WITH SPECIALISTS

Keywords: Artificial Intelligence, Patient information and education

R. Mu¹, C. Xu², D. Xu¹, J. Zhao¹, R. Liu¹, Y. Dai³, K. Sun⁴, P. Wong⁵, S. S. M. Lee², L. W. Koh², J. Wang⁶, S. Xie⁶, L. Zeng⁷

¹Peking University Third Hospital, Department of Rheumatology and Immunology, Beijing, China
²Tan Tock Seng Hospital, Department of Rheumatology, Allergy and Immunology, Singapore, Singapore
³Fujian Provincial Hospital, Department of Rheumatology and Immunology, Fuzhou, China
⁴Duke University, Department of Medicine, Division of Rheumatology and Immunology, North Carolina, United States of America
⁵The Chinese University of Hong Kong, Division of Medicine and Therapeutics, Prince of Wales Hospital, Hong Kong, China
⁶Beijing Kidney Health Technology Co., Ltd, Beijing, China
⁷Peking University Third Hospital, Research Center of Clinical Epidemiology, Beijing, China

Background: Artificial intelligence (AI)-driven chatbots like ChatGPT are gaining recognition for their healthcare applications, yet the efficacy of ChatGPT4 in specialized medical consultations remains underexplored, particularly in rheumatology.

Objectives: This study compares the proficiency of ChatGPT4’ responses with practicing rheumatologists to inquiries from patients with systemic lupus erythematosus (SLE).

Methods: In this cross-sectional study, 95 frequently asked questions (FAQs) were curated from the Kidney Online platform ( www.shensx.com ). Responses for FAQs from ChatGPT4 and 5 rheumatologists were generated and then scored separately by a panel of rheumatologists and a group of patients with SLE. Responses were scored across 6 domains (scientific validity, logical consistency, comprehensibility, completeness, satisfaction level, and empathy) on a 0-10 scale (a score of 0 indicates entirely incorrect responses, while 10 indicates accurate and comprehensive answers).

Results: Rheumatologists’ scoring revealed that ChatGPT4-generated responses outperformed those from rheumatologists in satisfaction level and empathy, with mean differences of 0.542 (95% CI, 0.258 to 0.827; P< 0.01) and 0.453 (95% CI, 0.221 to 0.685; P <0.01), respectively (Figure 1). No significant differences were observed in scientific validity, logical consistency, comprehensibility or completeness (Figure 1). From the SLE patients’ perspective, ChatGPT4-generated responses were comparable with the rheumatologist-provided answers in all 6 domains (Figure 2).

Conclusion: ChatGPT4 demonstrated comparable, possibly better in certain domains, to address the FAQs from patients with SLE, when compared with the answers provided by specialists. This study showed the potential of applying ChatGPT4 to improve the consultation in rheumatology.

Figure 1.

Rheumatologists’ evaluation revealed that ChatGPT4-generated responses notably outperformed those from rheumatologists in terms of satisfaction level and empathy.

Figure 2.

Patients’ evaluation revealed that ChatGPT4-generated responses mirrored rheumatologists-provided answers in terms of scientific validity, logical consistency, comprehensibility, completeness, satisfaction level and empathy.

REFERENCES: NIL.

Acknowledgements: NIL.

Disclosure of Interests: None declared.

DOI: 10.1136/annrheumdis-2024-eular.935

Keywords: Artificial Intelligence, Patient information and education

Citation: , volume 83, supplement 1, year 2024, page 1811

Session: Systemic lupus erythematosus (Publication Only)

version:	1.02