70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.
70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.
Evaluation of surrogate validation approaches accepted in German health technology assessment
Text
Introduction: Surrogate variables are crucial in drug development, substituting endpoints that are difficult or time-consuming to collect. A frequent example is progression-free survival (PFS) as a surrogate for overall survival (OS). Among Health Technology Assessment (HTA) bodies in Europe, only the Institute for Quality and Efficiency in Health Care (IQWiG) formulates quantitative thresholds to validate a surrogate [1], highlighting its standards’ pioneering role for EU HTA regulation (HTAR). However, previous research has been pessimistic of whether these conservative surrogate validation standards can realistically be achieved [2].
Methods: IQWiG's rapid report on validating surrogate endpoints considers two approaches: one based on the correlation between the treatment effect on the surrogate and the treatment effect on the patient-relevant outcome, and another based on a surrogate threshold effect (STE) determined via meta-regression. A simulation study was conducted to determine the power of both approaches, varying the number of studies, sample sizes, and effect sizes for OS and PFS (event rates > 70%), while maintaining a high correlation (0.85) between surrogate and patient-relevant outcomes. Unlike previous studies [2], the power for the STE approach was determined by calculating prediction intervals using the Knapp-Hartung model with ad hoc variance correction. In addition, the power was assessed for a 95% and an 80% prediction interval, which are required for a proof or a hint according to IQWiG's rapid report.
Results: In all simulated scenarios, the STE approach had higher power compared to the correlation approach. In scenarios reflecting early risk assessment with five studies, achieving 80% power for a proof to validate via the STE concept was unattainable even when studies included N=1000 patients and both HRs (treatment effect on the surrogate and the treatment effect on the patient-relevant outcome) were 0.7. The corresponding power in this situation was approximately 40% with ad hoc variance correction and around 60% without it. Only when the HRs were 0.5 could a power exceeding 80% be observed, given five studies each with a sample size of N=1000. To obtain 80% power for a hint for the validation of a surrogate with five studies, both HRs needed to be 0.5, and each study had to have a sample size of N=500. If both HRs were 0.7, ten studies with a sample size of N=1000 were necessary to achieve 80% power. Without ad hoc variance correction, five studies of same size just reached 80% power.
Discussion: IQWiG’s standards for validating surrogates are conservative. In situations with up to five studies low power makes validation unrealistic. However, scenarios with up to five studies are typical for early benefit-risk assessments, raising the question whether less conservative approaches would be helpful. Possible alternatives have already been suggested in the literature, e.g. the STE approach could use the point estimate instead of the lower prediction interval. As IQWiG provides the only quantitative thresholds for a surrogate validation among HTA bodies in Europe, an adaptation of their guideline could have a substantial influence on the new HTAR.
Gregor Buch and Anton Schönstein are employees at Boehringer Ingelheim.
The authors declare that an ethics committee vote is not required.
References
[1] Grigore B, Ciani O, Dams F, Federici C, de Groot S, Möllenkamp M, Rabbe S, et al. Surrogate Endpoints in Health Technology Assessment: An International Review of Methodological Guidelines. Pharmacoeconomics. 2020 Oct;38(10):1055-1070. DOI: 10.1007/s40273-020-00935-1[2] Gillhaus J, Goertz R, Jeratsch U, Leverkurs F. Surrogatvalidierung durch Korrelation und Surrogate Threshold Effect – Ergebnisse von Simulationsstudien. GMS Med Inform Biom Epidemiol. 2017;13(1):Doc01. DOI: 10.3205/mibe000168



