Logo

70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS)
07.-11.09.2025
Jena


Meeting Abstract

Combining machine learning methods for subgroup identification in time-to-event data with approximate Bayesian computation for bias correction

Henrik Stahl 1
Lukas Klein 1
Gunter Grieser 1
Antje Jahn 1
Heiko Götte 2
1University of Applied Sciences Darmstadt (h_da), Darmstadt, Germany
2Merck Healthcare KGaA, Darmstadt, Germany

Text

Personalized medicine is a crucial aspect in finding effective treatments for patients. In clinical development it is essential to identify subgroups of patients who exhibit a beneficial treatment effect, ideally before moving to confirmatory trials. The identified subgroups could be defined by predictive biomarkers with corresponding cut-off values. However, once biomarkers or corresponding cut-offs are selected in a data-driven manner a selection bias is introduced, i.e. the treatment effect within the selected subgroup is overestimated.

In previous work, the approximate Bayesian computation (ABC) algorithm was utilized to correct for this selection bias [1]. This approach rather covers a reduced range of potential subgroups that are defined by cut-off values. Machine learning (ML)-based subgroup identification methods allow to cover much more potential subgroups with the downside of even greater bias and less interpretable subgroup definitions. Our goal is to extend the ABC algorithm to correct for selection bias also in these situations. Since our research is motivated by clinical trials in oncology, we will focus on time-to-event data such as overall survival or progression-free survival time.

ABC is a simulation approach that selects simulation runs where some particular statistic calculated from trial data at hand is similar to that calculated from simulated data where the true treatment effects are known. The true treatment effects from the selected simulation runs then define an approximation of their posterior distribution that is used for bias correction. Compared to [1] ML methods raise additional questions that makes an extension not straight forward: The higher the complexity of the ML approach is the less comparable are the subgroup definitions between the simulation runs. Therefore, next to bias correction also “overlap with true subgroup”, “rate of correct biomarker inclusion” and “similarity in subgroup size” has to be assessed. Depending on the underlying goal of the ML algorithm there is also a higher or lower inherent tendency for bias and a methods potential for correcting that bias needs to be traded off against its potential to identify the “right” patients.

All those aspects are investigated in simulation studies based on the ADEMP framework. We start with two approaches: model-based partitioning (MOB) [2], [3], [4] as an ML approach and use LASSO regression [5] with treatment interactions as a comparator. In both approaches ABC is investigated for correcting selection bias.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


Literatur

[1] Götte H, Kirchner M, Kieser M. Adjustment for exploratory cut-off selection in randomized clinical trials with survival endpoint. Biom J. 2020 May;62(3):627-642. DOI: 10.1002/bimj.201800302
[2] Zeileis A, Hothorn T, Hornik K. Model-based recursive partitioning. Journal of Computational and Graphical Statistics. 2008;17(2):492–514. DOI: 10.1198/106186008X319331
[3] Seibold H, Zeileis A, Hothorn T. Model-Based Recursive Partitioning for Subgroup Analyses. Int J Biostat. 2016 May 1;12(1):45-63. DOI: 10.1515/ijb-2015-0032
[4] Sun S, Sechidis K, Chen Y, Lu J, Ma C, Mirshani A, Ohlssen D, Vandemeulebroecke M, Bornkamp B. Comparing algorithms for characterizing treatment effect heterogeneity in randomized trials. Biom J. 2024 Jan;66(1):e2100337. DOI: 10.1002/bimj.202100337
[5] Lipkovich I, Dmitrienko A, B R D'Agostino Sr. Tutorial in biostatistics: data-driven subgroup identification and analysis in clinical trials. Stat Med. 2017 Jan 15;36(1):136-196. DOI: 10.1002/sim.7064