Logo

70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS)
07.-11.09.2025
Jena


Meeting Abstract

Novel Visualization Techniques for Key Metadata of Biobank Samples

Til Karsten Spreuer 1,2
Katharina Wendlandt 1,2
Clemens Ruppert 1,2
Andreas Günther 1,2
Raphael W. Majeed 1,2
1Department of Internal Medicine, Universities of Giessen and Marburg Lung Center (UGMLC), Member of the German Center for Lung Research (DZL), Gießen, Germany
2Institute for Lung Health (ILH), Cardio-Pulmonary Institute (CPI), Gießen, Germany

Text

Introduction: Biobanks are important institutions for processing and storing human liquid and tissue samples. They are a crucial building block of medical research.

While visualization techniques got more accessible and advanced in recent years for biomedical data [1], visualization of biobank meta-data is usually limited to basic options provided by the corresponding laboratory information management system (LIMS). Our goal is to combine formal definitions of meta-data quality [2] with modern visualization techniques for summarizing the content of a biobank and detect strengths and weaknesses in our workflows.

Methods: We applied our methods to ~10000 samples of ~1500 patients from 6 different cohorts. For ~200 patients we combined the biobank data with the corresponding observational registry data based on common pseudonyms to allow for a deeper analysis.

For plots and visualizations, we used Python, Seaborn and Matplotlib to plot attributes for the different samples, e.g. time-to-freeze or visualize clustering or deviations from corresponding registry data.

Results: For visualization of combined data from biobank and registry, we plotted patients (Y axis) vs visit dates (Y-axis). A combination of 'o'-markers with fillstyles 'left' and 'right' resulted in crescent-shaped symbols as well as single dots allowed a clear presentation of deviation between biobank samples, registry data and planned visits. If the 'moon' is complete and filled the sample is captured as intended. With this deviation plot we found that some samples were stored in the biobank but not registered in the corresponding registry, and vice-versa.

Further analysis shows that 469 samples are complete, but 34 respectively 18 are not.

We applied UMAP-plots [3], which are commonly used for reducing high-dimensional data to two dimensions on sample attributes such as time-to-centrifugate, time-to-freeze, age of the patient (then) and age of the sample (now). This plot is supplemented by a violin plot showing the average age of the participants of each cohort.

The UMAP plot revealed one very noticeable cluster consisting of only samples from the same cohort. Based on the violin plot we see that the patients’ average age of the clustered cohort is much lower than for any cohort, because it was a healthy comparison.

Discussion: Novel visualization methods which are commonly used in other fields are also suitable for visualizing biobank metadata. The presented results allowed us to highlight inconsistencies in the data and thus improve overall quality of documentation. Additionally, these visualizations can be helpful to monitor changes in processes and progress for a biobank. For the future, we plan to combine data from additional registries as well as clinical data with sample information.

The authors declare that they have no competing interests.

The authors declare that a positive ethics committee vote has been obtained.


Literatur

[1] O'Donoghue SI, Baldi BF, Clark SJ, Darling AE, Hogan JM, et al. Visualization of biomedical data. Annual Review of Biomedical Data Science. 2018 Jul 20;1(1):275-304.
[2] Shekhovtsov VA, Eder J. Metadata Quality for Biobanks. Appl Sci. 2022;12:9578. DOI: 10.3390/app12199578
[3] McInnees L, Healy J, Melville J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction [Preprint]. arXiv. 2018. DOI: 10.48550/arXiv.1802.03426