Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS)
07.-11.09.2025
Jena

Meeting Abstract

Comparison of methods with and without multiple thresholds for the meta-analysis of diagnostic test accuracy studies

Ferdinand Valentin Stoye - Biostatistik und Medizinische Biometrie, Medizinische Fakultät OWL, Universität Bielefeld, Bielefeld, Germany

Olaf Raths - Biostatistik und Medizinische Biometrie, Medizinische Fakultät OWL, Universität Bielefeld, Bielefeld, Germany

Oliver Kuß - Deutsches Diabetes-Zentrum (DDZ), Leibniz-Zentrum für Diabetes-Forschung an der Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany

Annika Hoyer - Biostatistik und Medizinische Biometrie, Medizinische Fakultät OWL, Universität Bielefeld, Bielefeld, Germany

Text

Introduction: Meta-analysis of diagnostic test accuracy (DTA) studies has increased in importance in the past decades due to a surge in quantitative medical research on diagnostic procedures. Methodologically, the resulting bivariate outcome of sensitivity and specificity is more challenging to meta-analyze than its univariate counterpart in clinical trials. In medical practice, methods for the meta-analysis of DTA studies often do not distinguish between diagnostic thresholds in the primary studies and only use information from a single threshold per study. In contrast, more recent approaches incorporate all available information by considering all diagnostic thresholds and can express summary estimates depending on the threshold. While there exist systematic comparisons between methods of the second kind [1], it remains unclear how methods that incorporate all information from the primary studies quantitatively compare to models that select a single threshold from each study.

Methods: In a first step to tackle this research question, we apply eleven methods (six which do and five which do not include information on multiple thresholds) to an existing meta-analysis dataset on the DTA of HbA1c to diagnose type 2 diabetes [2]. While the models that consider all information are estimated on the full dataset, containing results from 38 primary studies that report results on between one and 13 thresholds, the other models are fitted to a reduced version with one threshold per study. We compute summary receiver operating characteristic (sROC) curves and summary statistics for sensitivity, specificity, and the area under the sROC curve (AUC) along with 95% non-parametric bootstrap confidence intervals.

Results: The ranges in estimated AUC are 0.763 to 0.859, along with sensitivity (specificity) 0.646 to 0.774 (0.761 to 0.915). The latter two statistics are based on the optimal Youden-index with equally weighted sensitivity and specificity [3]. Bootstrap confidence interval widths vary between the models, but do not systematically depend on the fact if a model incorporates all available information. Five of the six models that incorporate all information can predict an optimal diagnostic threshold based on the estimates, ranging between 5.5% and 6.0%. When increasing the specificity weight to 80% which may be realistic for diagnostic applications, the predicted optimal threshold changes to 5.6% to 6.5%, while sensitivity (specificity) ranges from 0.378 to 0.530 (0.922 to 0.981) for all eleven models.

Discussion: Using a selection of eleven models to perform meta-analysis on the same dataset leads to distinctly varying results that could provide evidence for different conclusions from the same data. To enable reliable and streamlined meta-analyses of DTA studies additional work is needed that systematically compares methods via simulation.

Conclusion: There is still need for more quantitative research on comparisons of methods for the meta-analysis of DTA studies before an informed decision on the best overall method for a certain application can be made.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.

References

[1] Zapf A, Frömke C, Hardt J, Rücker G, Voeltz D, Hoyer A. Meta-Analysis of Diagnostic Accuracy Studies With Multiple Thresholds: Comparison of Approaches in a Simulation Study. Biometrical journal. 2024;66(7):e202300101. DOI: 10.1002/bimj.202300101
[2] Hoyer A, Hirt S, Kuss O. Meta-analysis of full ROC curves using bivariate time-to-event models for interval-censored data. Research synthesis methods. 2018;9(1):62-72. DOI: 10.1002/jrsm.1273
[3] Rücker G, Schumacher M. Summary ROC curve based on a weighted Youden index for selecting an optimal cutpoint in meta-analysis of diagnostic accuracy. Statistics in medicine. 2010;29(30):3069-3078. DOI: 10.1002/sim.3937

Citation Note

Stoye FV, Raths O, Kuß O, Hoyer A. Comparison of methods with and without multiple thresholds for the meta-analysis of diagnostic test accuracy studies In: Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie, editors. 70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS). Jena, 07.-11.09.2025. Düsseldorf: German Medical Science GMS Publishing House; 2025. DocAbstr. 26.

DOI: 10.3205/25gmds030

Download XML

License

© Stoye et al.
This abstract is distributed under the terms of the license Creative Commons Attribution 4.0 International License

Published: 2025-11-03

Get in touch.

70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.

Comparison of methods with and without multiple thresholds for the meta-analysis of diagnostic test accuracy studies

Text

References

ZB MED is a member of DataCite

ZB MED advocates gender equality

Award for German Medical Science

ZB MED advocates Open Access