70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.
70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.
Towards Uniformly Applicable Polygenic Risk Scores: Factors Influencing Cross-Ancestry Prediction Performance
Text
Introduction: Genome-wide association studies (GWAS) aim to investigate the associations between genetic variants and phenotypes. This is especially helpful for the study of complex diseases, for which it is demonstrated that they are highly polygenic. One possible application of GWAS is the calculation of polygenic risk scores (PRS). PRS hold high potential for future applications in clinical research for prevention, improving diagnostic accuracy, and supporting the decision on treatment, by enabling the prediction of an individual's disease risk. However, the majority of GWAS to date are conducted on individuals of European genetic ancestry. This can exacerbate disparities in prediction accuracy between genetic ancestries, as evaluations have shown that prediction accuracy decreases between genetic ancestries [1]. Recently, several methods were published that try to improve the prediction accuracy between genetic ancestries by combining GWAS summary statistics from different populations [2].
Methods: To assess the strengths and limitations of selected multi-ancestry prediction methods, we evaluated their performance across a range of genetic architectures and complex conditions known to affect PRS transferability between populations. We simulated genotypes representing diverse genetic ancestries, alongside phenotypes varying in polygenicity and genetic correlation between the populations. By varying the training sample sizes, we examined how study size influences the effectiveness of multi-ancestry prediction. As complex conditions, we introduced phenotypes with differing proportions of population-specific causal variants as well as evaluating predictive accuracy in an admixed target population. All simulation settings were evaluated multiple times to evaluate the stability of the methods.
Results: Among the modeled parameters, polygenicity had the strongest overall impact on prediction performance. With higher levels of polygenicity, the prediction performance in general decreased and the performance gap between populations became more pronounced. Phenotypes with higher genetic correlation between populations showed greater relative improvement achieved by multi-ancestry prediction methods, making it an important parameter for the effectiveness of multi-ancestry approaches. For the larger training sample size, the methods offered only a minor advantage in absolute prediction performance compared to a simple linear combination of PRS derived from single-ancestry prediction models. Despite relying on different strategies for effect size estimation, all evaluated methods responded similarly to the tested conditions. In direct comparison, the multi-ancestry prediction method MUSSEL [3] achieved the highest prediction performance under most evaluated conditions and was able to reduce the relative differences in prediction performance between ancestries under certain conditions, indicating improved cross-population generalizability.
Conclusion: For conditions that exacerbate ancestry-related disparities, such as limited sample sizes or highly polygenic architectures, multi-ancestry prediction methods can improve the prediction performance by leveraging data across populations. A remaining challenge lies in varying local LD structures found in admixed individuals, which require methods to account for mismatches between ancestry-specific LD and biased effect size estimation. However, our findings reveal that while current approaches can mitigate some disparities, they remain limited in their ability to overcome the performance gap between populations. This highlights the need to consider population-specific genetic features in multi-ancestry study designs.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.
Literatur
[1] Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet. 2019;51(4):584–591. DOI: 10.1038/s41588-019-0379-x[2] Kachuri L, Chatterjee N, Hirbo J, Schaid DJ, Martin I, Kullo IJ, et al. Principles and methods for transferring polygenic risk scores across global populations. Nat Rev Genet. 2024;25(1):8–25. DOI: 10.1038/s41576-023-00637-2
[3] Jin J, Zhan J, Zhang J, Zhao R, O’Connell J, Jiang Y, et al. MUSSEL: Enhanced Bayesian polygenic risk prediction leveraging information across multiple ancestry groups. Cell Genomics. 2024;4(4):100539. DOI: 10.1016/j.xgen.2024.100539



