German Congress of Orthopaedics and Traumatology (DKOU 2025)
Deutscher Kongress für Orthopädie und Unfallchirurgie 2025 (DKOU 2025)
Improving patient education on tibial osteotomy for knee osteoarthritis management with a customized ChatGPT: A readability and quality evaluation
Text
Objectives and questions: Knee osteoarthritis (OA) greatly affects patients’ quality of life, often leading to the need for surgical intervention. While Total Knee Arthroplasty (TKA) is a common solution, it may not be ideal for younger patients with unicompartmental OA, who could benefit more from High Tibial Osteotomy (HTO). Effective patient education is essential for informed decision-making, yet most online health information is too complex for the average person to comprehend. AI tools like ChatGPT offer a potential solution, but their responses often exceed the general public's literacy level. This study evaluated whether a customized ChatGPT model could enhance readability and source accuracy in patient education on Knee OA and tibial osteotomy.
Material and methods: Frequently asked questions about HTO were collected using Google’s “People Also Asked” feature and rewritten at an 8th-grade reading level. Two versions of ChatGPT-4 were compared: the standard model and a fine-tuned version, “The Knee Guide”, optimized for readability and source citation using Instruction-Based Fine-Tuning (IBFT) and Reinforcement Learning from Human Feedback (RLHF). Responses were assessed for quality using the DISCERN criteria and readability using the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL).
Results: The standard ChatGPT-4 model had a mean DISCERN score of 38.41 (range: 25–46), indicating poor quality, while “The Knee Guide” achieved a score of 45.9 (range: 33–66), reflecting moderate quality. Interrater reliability was strong, with a Cronbach’s Alpha of 0.86. Readability improved significantly with “The Knee Guide”, which had a mean FKGL of 8.2 (range: 5–10.7, ±1.42) and a mean FRES of 60 (range: 47–76, ±7.83), compared to the standard model’s FKGL of 13.9 (range: 11–16, ±1.39) and FRES of 32 (range: 14–47, ±8.3). These differences were statistically significant (p < 0.001).
Discussion and conclusions: Fine-tuning ChatGPT significantly enhanced the readability and quality of HTO-related patient education materials. “The Knee Guide” demonstrated the potential of customized AI models in making complex medical information more accessible and easier to understand for patients.



