Using LLMs for the Annotation of German Clinical Forms with SNOMED CT and the MII Core Data Set

25gmds062 10.3205/25gmds062 urn:nbn:de:0183-25gmds0625 Meeting Abstract Using LLMs for the Annotation of German Clinical Forms with SNOMED CT and the MII Core Data Set Riedel Riedel Andrea A

Erlangen University Hospital, Medical Center for Information and Communication Technology, Erlangen, Germany Friedrich-Alexander-Universität Erlangen-Nürnberg, Medical Informatics, Erlangen, Germany

author Kosminski Kosminski Michelle M

Friedrich-Alexander-Universität Erlangen-Nürnberg, Machine Learning and Data Analytics Lab, Erlangen, Germany Erlangen University Hospital, Medical Center for Information and Communication Technology, Erlangen, Germany

author Borst Borst Sandra S

Erlangen University Hospital, Medical Center for Information and Communication Technology, Erlangen, Germany

author Salin Salin Emmanuelle E

Friedrich-Alexander-Universität Erlangen-Nürnberg, Machine Learning and Data Analytics Lab, Erlangen, Germany

author Bohr Bohr Arijana A

Friedrich-Alexander-Universität Erlangen-Nürnberg, Machine Learning and Data Analytics Lab, Erlangen, Germany

author Eskofier Eskofier Bjoern B

Friedrich-Alexander-Universität Erlangen-Nürnberg, Machine Learning and Data Analytics Lab, Erlangen, Germany Translational Digital Health Group, Institute of AI for Health, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany

author Ganslandt Ganslandt Thomas T

Friedrich-Alexander-Universität Erlangen-Nürnberg, Medical Informatics, Erlangen, Germany

author Deppenwiese Deppenwiese Noemi N

author German Medical Science GMS Publishing House

Düsseldorf

610 clinical coding health information interoperability large language models systematized nomenclature of medicine terminology as topic 20251103 engl This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). M0631 062 Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie 70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS) V: Large language models & medical texts 2 Jena 20250907 20250911 Abstr. 317 TextIntroduction: The Medical Informatics Initiative (MII) aims to standardise routine care data for research based on the core data set (CDS) . To improve semantic interoperability, the CDS utilises the world’s leading health terminology SNOMED CT (SCT) .Medical forms are an inherent means of routinely capturing patient data through structured hospital documentation. Our project aims to enhance the reusability of clinical documentation forms through their semantic annotation, addressing a common issue of inconsistent standards across departments.Currently, the field of German natural language processing in medicine is limited by the scarcity of publicly accessible, domain-specific Large Language Models (LLMs) and German-language ground truth (GT) corpora with semantic annotations , , .Methods: We aim to accelerate the annotation process of German medical forms with the help of LLMs and by prioritising SCT concepts appearing within the CDS to support automated SCT coding. As the German National Edition is currently limited to specific use cases, we focused on annotations with the SCT International Edition (Version: 01-04-2025). We chose tumour board forms from the University Hospital Erlangen (UKER) as our use case, since the documentation of tumour board meetings is mandatory for hospitals certified by the German Cancer Society. Due to privacy concerns, we compared two locally-hosted LLMs, unsloth/Meta-Llama-3.1-8B-Instruct and mistralai/Mistral-7B-Instruct-v0.3.The form items were first preprocessed using unsloth/Mistral-Small-3.1-24B-Instruct-2503-unsloth-bnb-4bit. Next, we employed Retrieval Augmented Generation techniques. A list of possible SCT codes was extracted from the CDS (version 2025) using three different embedding methods (sentencetransformers/all-mpnet-base-v2, xlreator/biosyn-biobert-snomed, and abhinand/MedEmbed-base-v0.1). Finally, the two decoder models were tested for code suggestion using the SNOWSTORM server API, and final selection of the k (k= 1, 3, 5) most relevant codes. The proposed automated approach was evaluated by comparing the suggested codes with a manually annotated GT of 15 UKER tumour board forms by two local medical SCT experts.Results: Our GT annotations showed that 48% of the tumour board forms could be represented by pre-coordinated SCT concepts (Inter-Annotator-Agreement Cohen's Kappa (κ = 0.75 micro, 0.75 macro)). Around 4.8% of the chosen SCT concepts are part of the current CDS. The best results were shown for unsloth/Meta-Llama-3.1-8BInstruct with a xlreator/biosyn-biobert-snomed embedding, which correctly detected 46.2% of GT codes for one selected SCT code, and up to 57.8% for five selected SCT codes.Discussion: Our proposed pipeline is one of the first contributions to automated pre-annotation suggestions for SCT annotations of German medical forms, as manual annotation still outperforms automated approaches. The LLM-based annotation process was complicated by the German-English translation between the German form content and the English-language international terminology SCT. Additional primary factors for missing mappings were non-mappable local peculiarities, non-relevant supporting protocol instructions (e.g., proper names) or outdated SCT concepts within the CDS.Conclusion: Our pipeline will support the standardisation processes of German medical forms across different clinical MII sites. An analysis of linguistic, technical, and semantic aspects (e.g., prioritisation of specific semantic tags in SCT selection) provided insights for future research. Further investigations regarding automated post-coordinations are necessary to further limit manual efforts.The authors declare that they have no competing interests.The authors declare that an ethics committee vote is not required. Semler SC Wissing F Heyder R German medical informatics initiative 2018 Methods of information in medicine e50-6 Semler SC, Wissing F, Heyder R. German medical informatics initiative. Methods of information in medicine. 2018 May;57(S 01):e50-6. Ingenerf J Drenkhahn C 2024 Referenzterminologie SNOMED CT: Interlingua zur Gewährleistung semantischer Interoperabilität in der Medizin Ingenerf J, Drenkhahn C. Referenzterminologie SNOMED CT: Interlingua zur Gewährleistung semantischer Interoperabilität in der Medizin. Springer-Verlag; 2024 Jan 18. Hahn U Clinical Document Corpora -- Real Ones, Translated and Synthetic Substitutes, and Assorted Domain Proxies: A Survey of Diversity in Corpus Design, with Focus on German Text Data [Preprint] 2024 arXiv Hahn U. Clinical Document Corpora -- Real Ones, Translated and Synthetic Substitutes, and Assorted Domain Proxies: A Survey of Diversity in Corpus Design, with Focus on German Text Data [Preprint]. arXiv. 2024. DOI: 10.48550/arXiv.2412.00230 https://doi.org/10.48550/arXiv.2412.00230 Borchert F Lohr C Modersohn L Witt J Langer T Follmann M Gietzelt M Arnrich B Hahn U Schapranow MP GGPONC 2.0 - the German clinical guideline corpus for oncology: Curation workflow, annotation policy, baseline NER taggers Proceedings of the Thirteenth Language Resources and Evaluation Conference; 2022 Jun p. 3650-3660. Borchert F, Lohr C, Modersohn L, Witt J, Langer T, Follmann M, Gietzelt M, Arnrich B, Hahn U, Schapranow MP. GGPONC 2.0 - the German clinical guideline corpus for oncology: Curation workflow, annotation policy, baseline NER taggers. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference; 2022 Jun. p. 3650-3660. Carlini N Tramer F Wallace E Jagielski M Herbert-Voss A Lee K Roberts A Brown T Song D Erlingsson U Oprea A Extracting training data from large language models 30th USENIX security symposium (USENIX Security 21) 2021 2633-2650 Carlini N, Tramer F, Wallace E, Jagielski M, Herbert-Voss A, Lee K, Roberts A, Brown T, Song D, Erlingsson U, Oprea A. Extracting training data from large language models. In: 30th USENIX security symposium (USENIX Security 21) 2021. p. 2633-2650. 0 0 0 0