Logo

70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS)
07.-11.09.2025
Jena

Meeting Abstract

A Federated Artificial Intelligence Framework for Optimizing Pancreatic Cancer Treatment – a Technical Case Report

Anne-Christin Hauschild - University Medical Center Göttingen, Department of Medical Informatics, Göttingen, Germany; University of Göttingen, Campus Institute Data Science (CIDAS), Section of Medical Data Science, Göttingen, Germany; Institute for Predictive Deep Learning in Medicine and Healthcare, Justus-Liebig University, Gießen, Germany
Amirreza Aleyasin - University Medical Center Göttingen, Department of Medical Informatics, Göttingen, Germany
Nils Beyer - University Medical Center Göttingen, Department of Medical Informatics, Göttingen, Germany; University Medical Center Göttingen, Clinical Research Group 5002 (CRU5002), Göttingen, Germany
Lisa Fricke - Translational Pancreatic Cancer Research Center, TUM School of Medicine and Health, Department of Clinical Medicine – Clinical Department for Internal Medicine II, TUM University Hospital, Technical University of Munich, Munich, Germany
Jonas Hügel - University Medical Center Göttingen, Department of Medical Informatics, Göttingen, Germany; University of Göttingen, Campus Institute Data Science (CIDAS), Section of Medical Data Science, Göttingen, Germany; University Medical Center Göttingen, Clinical Research Group 5002 (CRU5002), Göttingen, Germany
Maryam Moradpour - University Medical Center Göttingen, Department of Medical Informatics, Göttingen, Germany; University of Göttingen, Campus Institute Data Science (CIDAS), Section of Medical Data Science, Göttingen, Germany; Institute for Predictive Deep Learning in Medicine and Healthcare, Justus-Liebig Universit, Gießen, Germany
Anh Tien Nguyen - University Medical Center Göttingen, Department of Medical Informatics, Göttingen, Germany; University of Göttingen, Campus Institute Data Science (CIDAS), Section of Medical Data Science, Göttingen, Germany; Institute for Predictive Deep Learning in Medicine and Healthcare, Justus-Liebig Universit, Gießen, Germany
Youngjun Park - University Medical Center Göttingen, Department of Medical Informatics, Göttingen, Germany; University of Göttingen, Campus Institute Data Science (CIDAS), Section of Medical Data Science, Göttingen, Germany; Max Planck Institute for Biology of Ageing, Cologne, Germany
Sophia Rheinländer - University Medical Center Göttingen, Department of Medical Informatics, Göttingen, Germany; University Medical Center Göttingen, Clinical Research Group 5002 (CRU5002), Göttingen, Germany
Tim Beißbarth - University Medical Center Göttingen, Department of Medical Bioinformatic, Göttingen, Germany; University of Göttingen, Campus Institute Data Science (CIDAS), Section of Medical Data Science, Göttingen, Germany; University Medical Center Göttingen, Clinical Research Group 5002 (CRU5002), Göttingen, Germany
Elisabeth Hessmann - University Medical Center Göttingen, Clinic for Gastroenterology, Gastrointestinal Oncology and Endocrinology, Göttingen, Germany; University Medical Center Göttingen, Clinical Research Group 5002 (CRU5002), Göttingen, Germany
Matthias Lauth - Philipps-University Marburg, Clinic for Gastroenterology, Endocrinology and Metabolism, Marburg, Germany
Martin Middeke - Philipps-University Marburg, Comprehensive Cancer Center, Marburg, Germany
Max Reichert - Translational Pancreatic Cancer Research Center, TUM School of Medicine and Health, Department of Clinical Medicine – Clinical Department for Internal Medicine II, TUM University Hospital, Technical University of Munich, Munich, Germany; University Medical Center Göttingen, Clinical Research Group 5002 (CRU5002), Göttingen, Germany; Center for Protein Assemblies (CPA), Technical University of Munich, Garching, Germany; Center for Organoid Systems (COS), Technical University Munich (TUM), Garching, Germany; German Cancer Consortium (DKTK), partner site Munich, a partnership between DKFZ and University Hospital Klinikum rechts der Isar, Munich, Germany; Bavarian Cancer Research Center (BZKF), Munich, Germany
Ulrich Sax - University Medical Center Göttingen, Department of Medical Informatics, Göttingen, Germany; University of Göttingen, Campus Institute Data Science (CIDAS), Section of Medical Data Science, Göttingen, Germany; University Medical Center Göttingen, Clinical Research Group 5002 (CRU5002), Göttingen, Germany

Text

Introduction: Ideally, centrally collecting and analyzing patient data with appropriate consent would provide optimal data quality and predictive performance; however, this approach is often not feasible in practice. Federated Learning (FL) or Federated Artificial Intelligence architectures have shown to be a promising approach to using and accessing distributed disease-related resources within the GDPR boundaries. This technical case report from the FAIrPaCT project describes the setup and implementation of a federated AI network infrastructure specifically designed for collaborative research in pancreatic cancer. We characterize the preconditions at the participating sites in Göttingen, Marburg, and Munich, as well as the necessary administrative and process-related steps to prepare data, individuals, and infrastructure to improve subtype identification and subsequent treatment options. We share our good and also sobering experiences tackling the challenges, and show some preliminary results of the federated learning AI pipelines in our pancreatic cancer projects.

Methods: At each participating site, the process begins with identifying and annotating the relevant data, which, after extraction and transformation, becomes accessible within a local FL hub - in our setup, this hub is implemented as a centrally developed and distributively deployed Docker container. This Docker container comprises the FL scripts generating the local models. We apply a newly developed Fed4POD algorithm incorporating all local features, including partial overlapping features specific to the local sites.

Results: Clinical data in cancer settings can be exported from the source systems in the standardized German oncology core data set (oBDS) format. Theoretically, this enables seamless further processing and integration into FL workflows. The pilot evaluation of the Fed4POD algorithm on multiple public data sets, including PDAC, showed robustness when processing the partially overlapping features. However, challenges arise from local cohort and distribution biases within the data.

Major roadblocks, including straightening the operational data security concepts for the infrastructures, the ethics approval for such novel architectures, and support for every site, have been addressed. Nevertheless, scaling up this approach to include broader multimodal data sets at collaborating sites appears feasible, while challenges remain in the envisioned large-scale deployment across many additional diverse healthcare institutions.

Conclusion: Federated architectures offer a promising approach for collaborative research and potential clinical use across different institutions and heterogeneous data availability. The experiences and knowledge gained throughout the two FAIrPaCT projects have the potential to lower the barrier for federated endeavors, paving the way for large-scale deployment across many additional sites that present considerable challenges in the future.

The authors declare that they have no competing interests.

The authors declare that a positive ethics committee vote has been obtained.


Literatur

[1] European Parliament. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). Official journal of the European Union. 2016;679.
[2] Li T, Sahu AK, Talwalkar A, Smith V. Federated Learning: Challenges, Methods, and Future Directions. IEEE Signal Process Mag. 2020;37(3):50–60.
[3] Arbeitsgemeinschaft Deutscher Tumorzentren e. V. (ADT); Deutsche Krebsregister e. V. (DKR). Regelwerk für Aktualisierung und Entwicklung des einheitlichen Onkologischen Basisdatensatzes nach § 65c SGB V (ADT/GEKID). 2023 Oct 11. Available from: https://www.basisdatensatz.de/download/Regelwerk2024.pdf
[4] Park Y, Schmidt CE, Batton BM, Hauschild AC. Federated Random Forest for Partially Overlapping Clinical Data. arXiv. 2024. DOI: 10.48550/ARXIV.2405.20738
[5] Hügel J, Beyer N, Bender T, Graf L, Rheinländer S, Sax U. Enhancing translational research projects and patient care with ETL pipelines for genomic and clinical data. In: 67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). German Medical Science; 2022. DOI: 10.3205/22GMDS049