70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.
70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.
Evaluating synthetic eye images for opthalmological assessments: Developing a systematic survey
Text
Introduction: The creation of digital avatars to anonymize personal health data is essential for enabling privacy-preserving [1]. This survey focuses on assessing the clinical relevance of synthetically generated medical images, involving evaluations by multiple participants. By verifying the realism and utility of these AI-generated images, the study ensures their potential for meaningful integration into clinical practice.
State of the art: Generative-AI (Gen-AI) technologies such as transformers-based architecture have revolutionized medical image synthesis by providing realistic representations. Prior studies revealed the use of these models for Retinal image generation [2], while [3] conducted AI robustness testing to verify the synthesis process. Though some studies exist, proper validation of generated synthetic images in ophthalmology applications is overlooked. Additionally, AI-based realism scoring models have been utilized for general image evaluation and their effectiveness in clinical assessments remain underexplored [4]. This study attempts to address these limitations by gathering evaluations from people for the AI generated synthetic images.
Concept: A systematic, structured and interactive online survey using an open-source platform is created considering ophthalmologists, medical imaging specialists and AI researchers. The involvement of human for evaluation emphasizes more precisely the realism of AI generated synthetic images rather than using AI itself. This survey was designed to facilitate comprehensive assessments by allowing participants to evaluate generated eye images for: grading, differentiation, ranking and identification of synthetic and real images.
Our goal was to select evaluation methods based on the individual’s ability to provide insights into image reality and diagnosis potential. By responding to validation mechanisms (e.g. sentinel questions, randomized question orders), we ensure survey quality thereby enhancing reliability of the findings.
Implementation: We chose LimeSurvey hosted at FSU Jena for this purpose due to its support for complex branching logic, randomized question ordering and integration of validation mechanisms. The designed survey allows quantitative analysis across participants while maintaining flexibility for future extension. The survey comprises three modules:
- Bulbar injection rating – Synthetic images are graded using a clinical reference scale (Jenvis Grading Score for bulbar injections)
- Real vs. Synthetic Identification – Participants classify images as real, or AI generated
- Image similarly matching – Participants select the closest real-image corresponding to a synthetic image through guidance score
AI-driven realism assessments (ViT, DINO) are planned for validation with human evaluations. Different guidance scores are considered for evaluating the images for different groups.
Lessons learned: The development of a systematic and structured survey, for the validation of synthetic images is complex. It requires appropriate questions summarizing the participants experience. The survey needs to consider the cognitive load as well as bias and framing with respect to participants. We consider the choice of image scoring as an important factor for the validation of images that will have tradeoffs in granularity and interpretability. We collected initial results for the survey within the working group (medical informatics background) and the feedback is integrated into the final version of the survey. Now, the survey is ready for validation and the results will be published accordingly.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.
Literatur
[1] Susser D, Schiff DS, Gerke S, Cabrera LY, Cohen IG, Doerr M, Harrod J, Kostick-Quenet K, McNealy J, Meyer MN, Price WN 2nd, Wagner JK. Synthetic Health Data: Real Ethical Promise and Peril. Hastings Cent Rep. 2024 Sep;54(5):8-13. DOI: 10.1002/hast.4911[2] Wang Z, Lim G, Ng WY, Tan TE, Lim J, Lim SH, Foo V, Lim J, Sinisterra LG, Zheng F, Liu N. Synthetic artificial intelligence using generative adversarial network for retinal imaging in detection of age-related macular degeneration. Frontiers in Medicine. 2023 Jun 22;10:1184892.
[3] Coyner AS, Chen JS, Chang K, Singh P, Ostmo S, Chan RP, Chiang MF, Kalpathy-Cramer J, Campbell JP, Imaging and Informatics in Retinopathy of Prematurity Consortium. Synthetic medical images for robust, privacy-preserving training of artificial intelligence: application to retinopathy of prematurity diagnosis. Ophthalmology Science. 2022 Jun 1;2(2):100126.
[4] Caron M, Touvron H, Misra I, Jégou H, Mairal J, Bojanowski P, Joulin A. Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision 2021. p. 9650-9660.



