70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.
70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.
Using LLMs for the Annotation of German Clinical Forms with SNOMED CT and the MII Core Data Set
2Friedrich-Alexander-Universität Erlangen-Nürnberg, Medical Informatics, Erlangen, Germany
3Friedrich-Alexander-Universität Erlangen-Nürnberg, Machine Learning and Data Analytics Lab, Erlangen, Germany
4Translational Digital Health Group, Institute of AI for Health, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
Text
Introduction: The Medical Informatics Initiative (MII) aims to standardise routine care data for research based on the core data set (CDS) [1]. To improve semantic interoperability, the CDS utilises the world’s leading health terminology SNOMED CT (SCT) [2].
Medical forms are an inherent means of routinely capturing patient data through structured hospital documentation. Our project aims to enhance the reusability of clinical documentation forms through their semantic annotation, addressing a common issue of inconsistent standards across departments.
Currently, the field of German natural language processing in medicine is limited by the scarcity of publicly accessible, domain-specific Large Language Models (LLMs) and German-language ground truth (GT) corpora with semantic annotations [3], [4], [5].
Methods: We aim to accelerate the annotation process of German medical forms with the help of LLMs and by prioritising SCT concepts appearing within the CDS to support automated SCT coding. As the German National Edition is currently limited to specific use cases, we focused on annotations with the SCT International Edition (Version: 01-04-2025). We chose tumour board forms from the University Hospital Erlangen (UKER) as our use case, since the documentation of tumour board meetings is mandatory for hospitals certified by the German Cancer Society. Due to privacy concerns, we compared two locally-hosted LLMs, unsloth/Meta-Llama-3.1-8B-Instruct and mistralai/Mistral-7B-Instruct-v0.3.
The form items were first preprocessed using unsloth/Mistral-Small-3.1-24B-Instruct-2503-unsloth-bnb-4bit. Next, we employed Retrieval Augmented Generation techniques. A list of possible SCT codes was extracted from the CDS (version 2025) using three different embedding methods (sentencetransformers/all-mpnet-base-v2, xlreator/biosyn-biobert-snomed, and abhinand/MedEmbed-base-v0.1). Finally, the two decoder models were tested for code suggestion using the SNOWSTORM server API, and final selection of the k (k= 1, 3, 5) most relevant codes. The proposed automated approach was evaluated by comparing the suggested codes with a manually annotated GT of 15 UKER tumour board forms by two local medical SCT experts.
Results: Our GT annotations showed that 48% of the tumour board forms could be represented by pre-coordinated SCT concepts (Inter-Annotator-Agreement Cohen's Kappa (κ = 0.75 micro, 0.75 macro)). Around 4.8% of the chosen SCT concepts are part of the current CDS. The best results were shown for unsloth/Meta-Llama-3.1-8BInstruct with a xlreator/biosyn-biobert-snomed embedding, which correctly detected 46.2% of GT codes for one selected SCT code, and up to 57.8% for five selected SCT codes.
Discussion: Our proposed pipeline is one of the first contributions to automated pre-annotation suggestions for SCT annotations of German medical forms, as manual annotation still outperforms automated approaches. The LLM-based annotation process was complicated by the German-English translation between the German form content and the English-language international terminology SCT. Additional primary factors for missing mappings were non-mappable local peculiarities, non-relevant supporting protocol instructions (e.g., proper names) or outdated SCT concepts within the CDS.
Conclusion: Our pipeline will support the standardisation processes of German medical forms across different clinical MII sites. An analysis of linguistic, technical, and semantic aspects (e.g., prioritisation of specific semantic tags in SCT selection) provided insights for future research. Further investigations regarding automated post-coordinations are necessary to further limit manual efforts.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.
References
[1] Semler SC, Wissing F, Heyder R. German medical informatics initiative. Methods of information in medicine. 2018 May;57(S 01):e50-6.[2] Ingenerf J, Drenkhahn C. Referenzterminologie SNOMED CT: Interlingua zur Gewährleistung semantischer Interoperabilität in der Medizin. Springer-Verlag; 2024 Jan 18.
[3] Hahn U. Clinical Document Corpora -- Real Ones, Translated and Synthetic Substitutes, and Assorted Domain Proxies: A Survey of Diversity in Corpus Design, with Focus on German Text Data [Preprint]. arXiv. 2024. DOI: 10.48550/arXiv.2412.00230
[4] Borchert F, Lohr C, Modersohn L, Witt J, Langer T, Follmann M, Gietzelt M, Arnrich B, Hahn U, Schapranow MP. GGPONC 2.0 - the German clinical guideline corpus for oncology: Curation workflow, annotation policy, baseline NER taggers. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference; 2022 Jun. p. 3650-3660.
[5] Carlini N, Tramer F, Wallace E, Jagielski M, Herbert-Voss A, Lee K, Roberts A, Brown T, Song D, Erlingsson U, Oprea A. Extracting training data from large language models. In: 30th USENIX security symposium (USENIX Security 21) 2021. p. 2633-2650.



