Logo

German Congress of Orthopaedics and Traumatology (DKOU 2025)

Deutsche Gesellschaft für Orthopädie und Unfallchirurgie (DGOU), Deutsche Gesellschaft für Orthopädie und Orthopädische Chirurgie (DGOOC), Deutsche Gesellschaft für Unfallchirurgie (DGU), Berufsverband für Orthopädie und Unfallchirurgie (BVOU)
28.-31.10.2025
Berlin


Meeting Abstract

Deep learning for medical image synthesis: Improving AI-based knee osteoarthritis diagnosis

Ricardo Smits Serena 1,2
Christina Valle 1
Igor Lazic 1
Jan Neumann 3
Alexander Marka 3
Rüdiger von Eisenhart-Rothe 1
Daniel Rueckert 2
Florian Hinterwimmer 1,2
1Department of Orthopaedics and Sports Orthopaedics, Klinikum rechts der Isar, Technical University of Munich, Munich, Deutschland
2Institute for AI and Informatics in Medicine, Technical University of Munich, Munich, Deutschland
3Department of Diagnostic and Interventional Radiology, Klinikum rechts der Isar, School of Medicine, Technical University of Munich, Munich, Deutschland

Text

Objectives and questions: Deep learning (DL) is increasingly used in medical diagnosis and prognosis. However, medical datasets often suffer from class imbalance and limited availability due to privacy concerns, regulatory constraints, data heterogeneity, annotation challenges, bias, and legal risks, among others. Generative models, such as Generative Adversarial Networks (GANs) and Denoising Diffusion Probabilistic Models (DDPMs), can address these challenges by synthesizing realistic medical images. In knee osteoarthritis (KOA) research, these models help balance datasets, potentially improving DL diagnostic accuracy. While GANs are computationally efficient, diffusion models often generate higher-quality images. However, they require significant computational resources and fine-tuning. Direct comparisons, such as this study, are crucial to determining the effectiveness of these approaches in medical imaging.

Material and methods: Using the Osteoarthritis Initiative (OAI) dataset, we employed a DDPM to generate synthetic knee X-ray images across all OA severity grades. The dataset includes 8,259 knee radiographs with varying Kellgren-Lawrence (KL) grades. We compared DDPM-generated images to GAN-generated ones, evaluating their realism through a blinded review by two radiologists and an orthopedic surgeon. The impact of synthetic images on classification performance was assessed using a deep learning model trained on real and augmented datasets.

Results: DDPM-generated images achieved superior quality, as confirmed by lower Fréchet Inception Distance (FID) scores and expert evaluations. The 3 experts struggled to distinguish synthetic from real images, detecting 16.7% of the images generated by the DDPM Unconditional model compared to 62.5% of the GAN generated images. Incorporating DDPM-generated images improved classification accuracy from 81.3% to 83.0%, a statistically significant gain (p < 0.01), whereas the improvement from GAN-generated images was not significant (p ≈ 0.15). While precision remained stable, recall for OA-positive classes increased, indicating fewer false negatives. These findings suggest that higher-quality synthetic images lead to better DL model performance.

Discussion and conclusions: This study shows that DDPMs can generate high-fidelity synthetic knee osteoarthritis X-rays, outperforming GANs in realism and utility for DL training. Future research should explore higher-resolution or 3D imaging, alternative conditioning methods, and safeguards against memorization. Generative models could also aid clinical decision-making by simulating progression and treatment effect visualization. DDPMs offer a promising approach to medical image synthesis, addressing data limitations and enhancing DL diagnostics with ethical and practical integration.

Figure 1 [Fig. 1]

Figure 1: Comparison of generated images by the different models compared to real images. (WGAN: Wasserstein Generative Adversarial Network, Diff-Unco: Unconditional Diffusion Model, Diff-Cond: Conditional Diffusion Model)