70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.
70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.
Clustering neonatal and pediatric intensive care time series using k-means
Text
Introduction: Continuous recordings of different sensors in clinical settings, such as neonatal and pediatric intensive care units (NICU/PICU), produce large volumes of unlabeled multivariate time series (MTS) data. Labeling data is time-intensive, requires trained staff, and is error-prone. Unsupervised learning methods such as clustering could help to detect trends and anomalies in these signals, identifying relevant deviations from the expected patterns. Such methods have already shown promising results for applications like ECG or respiratory signal clustering [1], [2]. A filtering step by clustering could provide support in the decision-making process in NICU/PICU environments, where a continuous human monitoring is not possible and the data are more heterogeneous than in adult patients.
Methods: We propose an implementation of k-means clustering to identify recurring physiological patterns and evaluate the feasibility of unsupervised MTS clustering in neonatal and pediatric intensive care data. Clustering was performed on data from nine ventilated neonatal and pediatric patients, covering six clinically relevant parameters: airway flow, airway pressure, chest impedance, heart rate, respiratory rate, and fraction of inspired oxygen. Six scenarios were defined to explore different algorithm design choices, including multivariate and univariate input, and variation in: data resolution, input window lengths, number of clusters, and centroid initialization strategies.
Preprocessing included resampling parameters to common frequencies, linear interpolation for small sensor-caused data gaps, and splitting segments at longer gaps, as well as normalization (to avoid parameter-bias [3]). Data were segmented into overlapping windows ranging from 10 s to 3,600 s using sliding windows with fixed strides (1/3 or 1/5 of the window size), chosen to align with the expected duration of physiological events. This approach intends to adequately capture temporal patterns while taking misalignments into account. Euclidean Distance (ED) was used as the distance metric for clustering due to its computational efficiency and suitability for large volumes of fixed-length time segments.
Results: Clustering quality was assessed visually by the research team and clinical experts due to the lack of ground truth labels. While some trends (such as increases or decreases in heart rate and inspired oxygen) were recognized, we couldn't generally identify a meaningful clustering, especially in the multivariate settings. Larger window sizes often led to convergence into a single cluster, while smaller windows produced many empty clusters or lacked meaningful separation. Nevertheless, simpler patterns like heart rate minima were clustered successfully. Univariate clustering also managed to capture some parameter-specific patterns.
Discussion: While ED does not consider the misalignment between the compared segments, the use of overlapping sliding windows helped align periodic signals, such as airway flow. In contrast, clustering less repetitive signals like heart rate showed limited success. The lack of meaningful structure in the multivariate clustering likely results from mixing signals with differing temporal characteristics. Predefined centroids offered only minor improvements over random initialization, indicating that the data structure dominates the clustering outcomes.
Conclusion: Overall, the results imply that, while k-means provides a simple baseline for exploring patterns in unlabeled intensive care time series data, its interpretability and clinical utility are limited. Future work should explore the use of time-series-specific distance metrics and other alternative clustering algorithms such as k-Shape.
The authors declare that they have no competing interests.
The authors declare that a positive ethics committee vote has been obtained.
References
[1] Nezamabadi K, Sardaripour N, Haghi B, Forouzanfar M. Unsupervised ECG Analysis: A Review. IEEE Reviews in Biomedical Engineering. 2022;16:208-224. DOI: 10.1109/RBME.2022.3154893[2] Robles-Rubio CA, Kearney RE, Bertolizio G, Brown KA. Automatic unsupervised respiratory analysis of infant respiratory inductance plethysmography signals. PLoS ONE. 2020;15:e0238402. DOI: 10.1371/journal.pone.0238402
[3] Singh D, Singh B. Investigating the impact of data normalization on classification performance. Applied Soft Computing. 2020;97:105524. DOI: 10.1016/j.asoc.2019.105524



