70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.
70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.
Losses, Splits, Sampling and Gradients. Does IPCW Simply Work for any Machine Learning Algorithm?
Text
Introduction: In 2016 the central German organ transplantation registry (TxReg) was established [1] to enhance research in organ transplantation. Despite its potential, the data is rarely used for analyses. One possible reason for its limited use could be data quality issues [2]. Related to survival analysis, significant issues are short maximum follow-up time and annual reporting schedules with occasional in-between reportings, limiting the ability to create long-term survival predictions.
Potentially motivated by similar issues, predictions based on transplantation registries from other countries are often derived for specific time points [3], using classifiers for probability predictions. When adopting this approach, it is important to address censoring. This can be done with inverse-probability of censoring weighting (IPCW) [4]. With IPCW, sample weights are used in the fitting process to achieve unbiased predictions.
Many machine learning libraries such as scikit-learn accept sample weights. However, the documentation of sample weight arguments for the training process usually lacks the implementation details, where and how weights exactly are used and, therefore, if the implementation targets the IPCW approach.
For example, in random forests, weights can be applied during the fitting process, either in the bootstrap sampling or in the split criteria within trees. In algorithms derived from gradient descent, weighted loss functions are used. The weights then also affect the Hessian matrix, influencing the optimization process.
Methods: We examined the implementation of commonly used classification methods such as the Random Forest Classifier, the Gradient Boosting Classifier and more to assess if the implementation of sample weights can be used to address censoring by IPCW. To support our findings, we conduct a simulation study following ADEMP [5] principles. Our goal was the identification of implementations which are able to successfully produce unbiased predictions, when IPCW is applied with increasing censoring rates. Data for the simulation was generated using a Weibull model with varying censoring rates and normally distributed covariates. The objective was to predict survival at a specific time point, using linear models, dense neural networks, tree-based methods, gradient boosting approaches and a model independent weighted bootstrap approach. For the evaluation, the bias on a separate test dataset was calculated. We also compared model performance on the TxReg data for transplantations from deceased donors.
Results: We were able to show that for methods, which use the weights for splitting rules achieve a better bias correction, even for higher censoring rates, as compared to loss based methods like gradient boosting or neural networks.
Conclusion: We provide an overview of different sample weighting implementations in current libraries and demonstrate that there are situations, where some implementations prevent the IPCW approach from being able to correct the bias caused by right censoring. We hope, that with better awareness of IPCW for machine learning methods improve the observed use of potentially biased classifiers in the literature.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.
The contribution has already been presented at SAFJR 2025 (https://safjr2025.uni-bonn.de/).
Literatur
[1] Nagel R. Über das Transplantationsregister. transplantations-register.de; 2022. Available from: https://transplantations-register.de/ueber-das-transplantationsregister[2] Otto G, Budde K, Bara C, Gottlieb J. Das Deutsche Transplantationsregister – eine Analyse der Altdaten 2006–2016. Gesundheitswesen. 2024;86(10):633–639. DOI: 10.1055/a-2251-5627
[3] Bhat V, Tazari M, Watt KD, Bhat M. New-Onset Diabetes and Preexisting Diabetes Are Associated With Comparable Reduction in Long-Term Survival After Liver Transplant: A Machine Learning Approach. Mayo Clin Proc. 2018;93(12):1794–802. DOI: 10.1016/j.mayocp.2018.06.020
[4] Vock DM, Wolfson J, Bandyopadhyay S, Adomavicius G, Johnson PE, Vazquez-Benitez G, et al. Adapting machine learning techniques to censored time-to-event health record data: A general-purpose approach using inverse probability of censoring weighting. J Biomed Inform. 2016;61:119–31. DOI: 10.1016/j.jbi.2016.03.009
[5] Morris TP, White IR, Crowther MJ. Using simulation studies to evaluate statistical methods. Stat Med. 2019;38(11):2074–102. DOI: 10.1002/sim.8086



