German Congress of Orthopaedics and Traumatology (DKOU 2025)
Deutscher Kongress für Orthopädie und Unfallchirurgie 2025 (DKOU 2025)
Can AI replace doctors in patient consultations? Analyzing ChatGPT’s responses to common questions from patients with tibial plateau fractures
2OrthoCenter München, Munich, Deutschland
3Department of Trauma Surgery, Trauma Center Murnau, Murnau, Deutschland
4Department of Orthopaedics and Trauma Surgery, BG Klinikum Duisburg, Duisburg, Deutschland
Text
Objectives and questions: Conversational artificial intelligence (AI) systems like ChatGPT have gained increasing prominence as accessible tools for disseminating information across various fields, including healthcare. These systems hold promise for enhancing patient education and improving general understanding of medical conditions. However, limited research exists evaluating the reliability and clinical relevance of AI-generated responses in the context of specific medical conditions. Tibial plateau fractures are complex injuries that require detailed patient education to ensure informed decision-making and optimal outcomes. This study aims to assess the accuracy, comprehensiveness, and clinical utility of ChatGPT’s responses to common patient questions about tibial plateau fractures.
Material and methods: We identified the ten most frequently asked questions by patients regarding tibial plateau fractures by querying ChatGPT directly for the 10 most common questions of patients with TPF. The identified questions were then presented to ChatGPT, and the generated responses were categorized into four groups: “excellent response not requiring clarification”, “satisfactory requiring minimal clarification”, “satisfactory requiring moderate clarification”, or “unsatisfactory requiring substantial clarification”. The quality of the responses was independently evaluated by a panel of experienced trauma surgeons.
Results: Of the ten evaluated responses, 50% were rated as satisfactory, requiring minimal clarification; 30% were rated as satisfactory, requiring moderate clarification; and 20% were rated as unsatisfactory, requiring substantial clarification. The mean rating across all questions was 2.9 (SD 0.88), reflecting moderate reliability. Questions related to treatment options, such as operative versus conservative management, received higher ratings, while questions regarding prognosis and long-term functional outcomes were rated less favorably.
Discussion and conclusions: ChatGPT provided generally satisfactory information to common patient questions about TPF. However, most responses required some level of clarification, particularly for complex or prognostic questions. While ChatGPT shows promise in delivering general medical information, it is not yet a substitute for personalized consultations with healthcare providers.



