70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.
70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.
Missing value imputation for single methylomes
2Institut für Medizinische Statistik, Informatik und Datenwissenschaften (IMSID), Universitätsklinikum Jena, Jena, Germany
3Institut für Medizinische Statistik, Informatik und Datenwissenschaften, Universitätsklinikum Jena, Jena, Germany
Text
Introduction: Personalized medicine puts the patient's uniqueness at different omic layers at its center in order to select tailored disease preventions, diagnoses or treatments [1]. DNA methylation is a key epigenetic omic layer, particularly valued because of its reversible nature [2]. However, in practice, methylation datasets usually contain a considerable proportion of missing values [3]. Missing values mainly arise during data collection and represent a ubiquitous problem for downstream data analysis. To impute missing DNA methylation values, several approaches have been proposed from both statistics computer sciences. They all have in common that they are in principle applicable to both DNA methylation microarray and sequencing data, but that they require information from at least two samples.
We propose a time and cost-effective imputation method for replacing missing DNA-methylation values in a single patient methylome, i.e. a method that relies on the personalized medicine idea.
Methods: Based on the observation that CpGs closer to each other are more likely to be methylated in a similar way, the method replaces a missing value by an available value of its nearest neighbouring CpG. Thereby, the distance between two CpGs refers to the smallest linear distance along the DNA sequence, measured in base pairs, between two CpG sites on the same chromosome and strand. We compared the new methods with two exemplary methods (impute.knn, methyLImp) using simulations.
Results: The proposed method applied to a single methylome yielded an average root mean square error (RMSE) RMSE = 0.27 in β-value units (95%-CI: [0.26, 0.28]) based on publically available 450K BeadChip data set of 3,402 individuals (https://download.cncb.ac.cn/ewas/datahub/download/blood_methylation_v1.zip) with β-value ranging between 0 and 1. It is possible to consider the affiliation of CpGs to CpG islands when imputing missing methylation values. This improves the imputation accuracy. In addition, the imputation accuracy depends on the density of CpG sites on DNA-methylation microarrays and is higher the denser CpG sites are.
Conclusions: The proposed method efficiently imputes missing values from a single methylome with minimal computational cost and memory requirements, making it a valuable addition to the imputation toolbox for single-subject applications. Its imputation accuracy is inferior to approaches exploiting multiple methylation samples. Here, an important aspect is the low density of the current chips compare to the richness of the whole methylome. Looking forward, improved accuracy can be expected the denser the chips or as we move to whole-methylome sequencing.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.
References
[1] Rasool M, Malik A, Naseer MI, Manan A, Ansari S, Begum I, et al. The role of epigenetics in personalized medicine: challenges and opportunities. BMC Med Genomics. 2015;8 Suppl 1(Suppl 1):S5.[2] Gupta MK, Peng H, Li Y, Xu CJ. The role of DNA methylation in personalized medicine for immune-related diseases. Pharmacol Ther. 2023;250:108508.
[3] Di Lena P, Sala C, Prodi A, Nardini C. Missing value estimation methods for DNA methylation data. Bioinformatics. 2019;35(19):3786-93.



