70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.
70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.
Working in a Trusted Research Environment – Reporting First Hand Experience
2University Medical Center Göttingen, Department of Medical Bioinformatics, Göttingen, Germany
3Institute for Predictive Deep Learning in Medicine and Healthcare, Justus-Liebig University, Gießen, Germany
4University of Göttingen, Campus Institute Data Science, Section of Medical Data Science, Göttingen, Germany
Text
Introduction: Sharing health research data provides significant opportunities to generate further insight into complex diseases. Since the adoption of the General Data Protection Regulation (GDPR), sharing data has become more restrictive and challenging.
Consequently, making the data directly available to researchers introduces additional requirements regarding transparency of its use and data security. The UK Data Services Five Safes Framework addresses these challenges [1].
Data providers like UK Biobank (UKB) only allow access to their data in a trusted research environment (TRE) [2], [3]. Similarly, the European Health Data Space (EHDS) shall be implemented in TREs [4]. In this abstract, we report our firsthand experience working in a TRE with two cohorts for Pancreatic cancer and Psychiatry using sequencing data and medication records, respectively.
State of the art: In TREs, data is hosted in large data centers, e.g. on AWS for UKB, where it can be analyzed but not downloaded. This approach is similar to data enclaves, e.g., the N3C enclave in the US, which was introduced to analyze COVID-19 data collaboratively [5].
Concept: Research projects must apply for access to UKB data, and individual researchers must complete training courses regarding data protection and rules. Afterward, they are allowed to use the data in the Research Analysis Platform (RAP). The RAP provides a cohort explorer that enables researchers to explore and visualize data from 500,000+ patients and apply filters to build a smaller cohort for subsequent analysis.
Researchers can find extensive documentation and aggregate statistics in the “Showcase” which contains the database schema as well as metadata and descriptions for each field. Each year and specifically two weeks before the end of a project, researchers must report their findings and the UK Biobank must approve all data exports from the RAP before publication.
Implementation: The built-in cohort explorer and synthetic datasets derived from the actual data are used to understand the schema and to gather descriptive statistics. We employ a selection of programs from the built-in tool pool in addition to our own scripts.
Lessons learned: Algorithms should be developed and tested locally before being executed in the RAP to enhance developer experience and cut costs. This is especially important if the algorithms have not been explicitly developed for UKB data and additional data transformation steps are required.
The synthetic datasets help to understand the structure of the actual data, but their usefulness for testing is limited because some files contain placeholders instead of values derived from the actual data.
The dataset comprises a large number of data types and the RAP offers a wide selection of tools and features. Therefore, researchers must spend a significant amount of time working through documentation both for the dataset and the tools before they are able to work with them effectively. This is not a problem if a researcher works only in the UKB-RAP, but would amount to a significant overhead if they were to use multiple TREs. Adopting TREs addresses several GDPR requirements, but also underscores the usefulness of transferable training and tools.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.
Literatur
[1] Kavianpour S, Sutherland J, Mansouri-Benssassi E, Coull N, Jefferson E. Next-Generation Capabilities in Trusted Research Environments: Interview Study. J Med Internet Res. 2022 Sep 20;24(9):e33720.[2] Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al. UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLOS Medicine. 2015 Mar 31;12(3):e1001779.
[3] Khan MYI, Dillman A, Sanchez-Perez M, Hibino M, Aune D. Tobacco smoking and the risk of aortic dissection in the UK Biobank and a meta-analysis of prospective studies. Sci Rep. 2025 Apr 9;15(1):12083.
[4] Rodríguez-Mejías S, Degli-Esposti S, González-García S, Parra-Calderón CL. Toward the European Health Data Space: The IMPaCT-Data secure infrastructure for EHR-based precision medicine research. J Biomed Inform. 2024 Aug;156:104670.
[5] Haendel MA, Chute CG, Bennett TD, Eichmann DA, Guinney J, Kibbe WA, et al. The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment. Journal of the American Medical Informatics Association. 2021 Mar 1;28(3):427–43.



