70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.
70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.
Ophthalmological image synthesis with consideration of clinical correlations: A descriptive study
Text
Introduction: The need for medical (image) data in ophthalmological education and research poses challenges regarding data privacy and availability. This ongoing research project aims to develop data-driven methods for synthesising ophthalmological data based on real patient data and to evaluate their capabilities to represent clinical relations. Conditional diffusion-based machine learning and a privacy risk estimator are employed to generate synthetic datasets that closely resemble the original distributions while preserving privacy.
Methods: In the initial phase of the study, data from 31 international volunteers were collected. For each participant, imaging techniques were used to obtain data on the external structures of both eyes (including pupil, iris, eyelids), corneal topographies, and fundus scans. Furthermore, information was documented concerning each subject’s sex, age, regular use of vision aids, and the presence of diabetes or high blood pressure. In the subsequent phase, we fine-tuned three synthesis models based on SDXL 1.0 [1], with each model corresponding to a distinct class of images. In every instance, the eye side, the sex, and the presence of high blood pressure were incorporated as a conditioning context during training on the collected data, executed in a manner akin to the DreamBooth approach [2].
During inference, all generated images are evaluated regarding the privacy risk their potential publication poses. The assessment is based on low-dimensional semantic embeddings of all training images and of the synthesised one as well as the densities in the embedding space. The embeddings are computed with the help of a pre-trained vision transformer [3]. If a synthetic image is represented by an embedding that lies in a region densely populated with embeddings of training images, the distance to the nearest neighbour may be smaller than in a sparse region to be considered non-privacy violating. In this case, it can safely be published.
Results: In preliminary experiments, low guidance scores (≤3.0) used in the query for the synthetic images led to the most realistic-looking samples while the rejections (<70%), due to suspected privacy violations, were moderate. 75% of these privacy-protecting synthetic images of external ocular structures were assessed as suitable for teaching by Ophthalmotechnology students.
However, the human evaluation of the synthesised images in terms of realism and educational usability remains an ongoing process. One challenge is that while images of outer eye structures can be evaluated by non-specialists, the synthetic corneal topographies and fundus scans can only be evaluated by ophthalmological experts. The same is true for the evaluation of the correct representation of pathologies correlated to diabetes and hypertension in the synthetic images. A corresponding survey is currently in preparation.
Discussion and conclusion: The substantial progress in the domain of image synthesis has the potential to enhance medical education and research. In this context, it is imperative to ensure that clinically relevant correlations and details present in real-world data are preserved in synthetic images, while also safeguarding the privacy of patients. The framework under discussion, comprising a conditional image synthesiser and a semantic privacy risk estimator, has been developed to address these challenges.
The authors declare that they have no competing interests.
The authors declare that a positive ethics committee vote has been obtained.
Literatur
[1] Podell D, English Z, Lacey K, Blattmann A, Dockhorn T, Müller J, et al. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis. In: Proceedings of the Twelfth International Conference on Learning Representations (ICLR); 2024 May. Available from: https://openreview.net/forum?id=di52zR8xgf[2] Ruiz N, Li Y, Jampani V, Pritch Y, Rubinstein M, Aberman K. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. In: Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2023 Jun 18-22; Vancouver, BC, Canada. Los Alamitos: IEEE; 2023. p. 22500-22510. DOI: 10.1109/CVPR52729.2023.02155
[3] Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In: Proceedings of the Ninth International Conference on Learning Representations (ICLR); 2021 May. Available from: https://openreview.net/forum?id=YicbFdNTTy



