70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.
70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.
Validation of Healthcare-Integrated Biobanking Algorithms: A Novel Approach for Chronic Kidney Disease Identification
2Institut für Klinische Chemie und Laboratoriumsdiagnostik, Universitätsklinikum Jena, Jena, Germany
3Institut für Medizinische Statistik, Informatik und Datenwissenschaften, Universitätsklinikum Jena, Jena, Germany
Text
Introduction: Healthcare-Integrated Biobanking (HIB) collects and stores patient biological samples during routine clinical care, creating resources for research and precision medicine. However, identifying patients with specific conditions like chronic kidney disease (CKD) from heterogeneous real-world data remains challenging. Most existing existing models are designed for standardized epidemiological datasets, limiting their applicability in clinical biobanking, and few validation studies address their real-world performance, making their clinical value uncertain [1]. As prediction models are context-specific and require adaptation for new environments, algorithms developed outside the HIB framework need thorough validation within actual biobanking systems to ensure reliability [2]. For the Use Case diabetic mellitus, we already have shown that a specially developed algorithm can achieve good performance for HIB purposes [3]. This study aimed to develop and validate two HIB-specific algorithms for automated CKD identification using real-world electronic health records (EHR), thereby enabling targeted sample collection and retrospective cohort assembly in biobanking.
Methods: Two HIB-specific (data available at admission) logistic regression-based algorithms were developed in an existing training cohort (n=785) characterized by a high prevalence of CKD (48%), as previously described [3], [4]: the admissionHIB algorithm, utilizing data from the index hospital admission (creatinine, age, gender), and the historyHIB algorithm, which additionally incorporated creatine values from previous hospitalizations. The validation cohort included patients at Jena University Hospital who provided the Medical Informatics Initiative (MII) Broad Consent and were admitted between January 2018 and April 2020. Data quality was systematically assessed for completeness of the required features. The validation cohort was divided in a small gold-standard validation cohort (n=162) and a large silver-standard validation cohort (n=1,075). The CKD status of gold-standard cohort was determined by elaborate review of EHRs by physicians, whereby the silver-standard cohort was generated by using a validated algorithm from prior studies [3].
Results: Preliminary results demonstrated robust performance: The admissionHIB and historyHIB algorithms achieved F1-scores of 86% and 91%, respectively in the training cohort. The validation cohort had a lower prevalence of CKD compared to the training cohort (approximately 12%). Assessment of the HIB algorithms yielded F1-scores of 80% (admissionHIB) and 78% (historyHIB) in the silver-standard cohort, and 83% and 80%, respectively, in the gold-standard cohort.
Conclusion: Our study presents a practical and robust framework for developing and validating HIB algorithms using real-world EHR data. By evaluating algorithm performance across both validation cohorts, we demonstrate consistent and reliable identification of CKD, even in settings with lower prevalence. These findings underscore the feasibility of implementing automated identification strategies in biobanking, supporting sample collection. Furthermore, the approach can be adapted to other Use Cases beyond CKD.
The authors declare that they have no competing interests.
The authors declare that a positive ethics committee vote has been obtained.
References
[1] Mansmann U, On BI. The validation of prediction models deserves more recognition. BMC Med. 2025;23(1):166.[2] Van Calster B, Steyerberg EW, Wynants L, van Smeden M. There is no such thing as a validated prediction model. BMC Med. 2023;21(1):70.
[3] Stolp J, Weber C, Ammon D, Scherag A, Fischer C, Kloos C, et al. Automated sample annotation for diabetes mellitus in healthcare integrated biobanking. Comput Struct Biotechnol J. 2024;24:724-33.
[4] Weber C, Roschke L, Modersohn L, Lohr C, Kolditz T, Hahn U, et al. Optimized Identification of Advanced Chronic Kidney Disease and Absence of Kidney Disease by Combining Different Electronic Health Data Resources and by Applying Machine Learning Strategies. J Clin Med. 2020;9(9).



