70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.
70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.
Challenges with Logistic Mixed Models for Repeated Measures in R – Simulation Results Based on the frühstArt study
2University Medical Center Göttingen, Department of Medical Statistics, Göttingen, Germany
Text
Background: Childhood overweight and obesity are increasing public health problems with long-term consequences. Addressing these problems early is vital to prevent the progression of childhood obesity into adolescence. The “frühstArt” study will investigate whether an early cross-sectoral, family-centred intervention programme can reduce overweight and obesity in children aged 3 to 6 years within 12 months, as measured by BMI standard deviation scores (BMI-SDS). Other components such as sleep quality, physical activity, media consumption and dietary behaviour for a healthy lifestyle are being investigated as secondary endpoints in the study. Most of these outcomes are recorded in a binary yes/no format and are assessed at three time points (baseline, 6 months, 12 months after enrolment). Logistic mixed models for repeated measures are used to analyse these data. This approach poses particular computational challenges, including the limited methodological literature compared to linear mixed models. By simulating data in R (version 4.4.0 or above) [1], we aim to address potential pitfalls in implementing logistic mixed models and provide insights into their application for binary outcomes in health research.
Methods: To explore potential pitfalls in fitting logistic mixed models in R, we will simulate longitudinal binary data based on the frühstArt scenario. Based on the structure of the frühstArt dataset, the first simulated data scenario comprises 1,000 participants with repeated measurements across three time points, stratified by gender (male, female) with 3% more female than males and language (German (80%), Turkish (20%)). Deviations from the assumed proportions will be considered. Variables such as age, gender, language, time, and intervention group (intervention vs. control), are included as predictors, along with interactions of interest. Proportion of overweight is assumed to increase with age and percentage of missing values in the outcome is assumed to increase over time. We will compare different estimation approaches, optimizer settings, and model specifications with respect to bias, e.g. as the average difference between the estimated and the true values, coverage and precision as well as technical challenges such as runtime, convergence and collinearity [2]. The data are analysed by use of available packages in R, such as lme4, broom.mixed, emmeans and ggeffects.
Discussion and outlook: The simulation study focuses on the practical implementation of logistic mixed models for binary outcomes using R. Through simulated longitudinal data, we aim to identify and better understand common challenges that arise in applied settings and how these challenges can be addressed within the R environment. Additionally, we plan to explore R-based tools for presenting model results, including the visualization of predicted probabilities and interaction effects. Among other aspects, we expect to face challenges in scenarios with sparse categories within a predictor or the outcome variable. In these scenarios stability and convergence might be influenced by the choice of the optimizer method. Overall, this work seeks to contribute to a better understanding of both, the statistical and practical implications of implementing logistic mixed models for repeated measures, and to provide guidance for researchers facing similar analytical challenges, especially with R.
Trial registration number (Date): DRKS00030749 (29-09-2023)
The authors declare that they have no competing interests.
The authors declare that a positive ethics committee vote has been obtained.
Literatur
[1] R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2024.[2] Morris TP, White IR, Crowther MJ. Using simulation studies to evaluate statistical methods. Stat Med. 2019 May 20;38(11):2074-2102. DOI: 10.1002/sim.8086



