Logo

70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS)
07.-11.09.2025
Jena


Meeting Abstract

Filtering Synthetic Longitudinal Trajectories with Large Language Models as Synthetic Experts

Clemens Schächter 1
Astrid Pechmann 2
Janbernd Kirschner 2
Harald Binder 1
1Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
2Department of Neuropediatrics and Muscle Disorders, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany

Text

Introduction: Generating synthetic longitudinal data that accurately captures correlation structures within multivariate measurements and over time remains a challenge. When deep learning models are used for synthetic data generation, they often fail to accurately capture clinical patterns. To distinguish plausible from implausible trajectories, clinicians can apply their domain knowledge about the patient condition or disease development and filter out unrealistic samples.

Methods: We recreate this selection process with synthetic experts in the form of large language models that contain world knowledge about the disease progression which is hard to formalize as explicit mathematical priors into the data generation process. We apply this approach to longitudinal motor function data from the SMArtCARE registry, a prospective cohort study of patients with spinal muscular atrophy (SMA), in which patients undergo a set of different motor function exams. The multidimensional data is subject to ceiling effects, external events such as treatment effects, and different measurement instruments that are used to evaluate disease progression. To generate trajectories, we employ variational autoencoders that embed motor function test results in a low dimensional latent space, where we model disease progression with a longitudinal mixed effects model. We generate synthetic patient trajectories in the latent space and decode them back to the data space, where LLMs filter results based on clinical plausibility. This enables sampling of reconstructions of original trajectories and trajectories with controlled edits introduced by the mixed model (e.g. a medication switch).

Results: We demonstrate that LLM-based synthetic experts flag unrealistic generated patient trajectories, including those with implausible rapid improvements, internally contradictory data points or implausible treatment-switch artifacts.

When applied to motor function data in spinal muscular atrophy in pair comparisons of real and generated datapairs, we find that our approach flags implausible synthetic samples and also acts as a diagnostic tool for evaluating the generative model fit. Here we use the percentage of flagged synthetic samples serves as a proxy of the model fit to real-world data. To calibrate the lower bounds of the LLM’s judgment, we assessed its ratings on pairs of real-world trajectories and randomly permuted versions with disrupted temporal coherence.

Conclusion: Incorporating LLMs as synthetic experts offers a scalable mechanism to enforce domain-specific plausibility in synthetic longitudinal datasets. Combining probabilistic generative modeling with knowledge driven filtering enhances the credibility of simulated patient trajectories and ensures authentic disease progression patterns.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.