Logo

70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS)
07.-11.09.2025
Jena
 
Weiter

Meeting Abstract

Prompt Engineering Strategies for Context-Aware Medical Text Anonymization Using LLMs: Insights from the GraSCCo Corpus

Markus Wolfien - Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany; Center for Scalable Data Analytics and Artificial Intelligence, Dresden/Leipzig, Germany, Dresden, Germany
Florin Teschner - Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
Hung Manh Nguyen - Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
Martin Sedlmayr - Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany

Text

Introduction: The anonymization of clinical texts remains an ongoing challenge for enabling secondary use of healthcare data. With the increasing capabilities of large language models (LLMs) like ChatGPT-4.0, new opportunities arise for automating de-identification tasks [1], [2]. However, performance is highly sensitive to prompt design and document formatting [3]. This study evaluates how different prompt engineering strategies and input structuring can impact anonymization quality on synthetic German discharge letters from the GraSCCo corpus [4], [5].

Methods: Three anonymization strategies were compared using ChatGPT-4.0: (i) a single static prompt applied in a continuous session, (ii) prompt renewal with isolated sessions per document, and (iii) structured input with semantically segmented sections combined with prompt renewal. All approaches used the GeMTeX anonymization guideline as a reference. Outputs were manually reviewed and evaluated using precision, recall, F1-score, and error rate.

Results: Anonymization performance remained constant across iterations, with F1-scores around 0.72 (static prompt) to 0.79 (structured input). The error rate dropped from 19.4% to 7.6%, demonstrating a slight benefit of both prompt renewal and document structuring. However, these improvements were accompanied by a notable increase in false positives, particularly in masking non-identifying medical terms, such as medications and lab values.

Discussion: Prompt engineering and input formatting can affect the reliability of LLM-based anonymization [6]. While structured prompting did not improve overall F1-score, it increases over-masking, emphasizing the need for careful balance between data utility and privacy. Future work should explore prompt fine-tuning, guided pre-processing based on segment length, and hybrid approaches combining LLMs with rule-based verification. Local deployment using models like Ollama or DeepSeek may support clinical integration under privacy-sensitive conditions.

Acknowledgements: This work was supported by the Federal Ministry of Research, Technology and Space (BMFTR) as part of the GeMTeX-Project (FKZ: 01ZZ2314F).

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


Literatur

[1] Patsakis C, Lykousas N. Man vs the machine in the struggle for effective text anonymisation in the age of large language models. Sci Rep. 2023 Sep 25;13(1):16026.
[2] Liu Z, Huang Y, Yu X, Zhang L, Wu Z, Cao C, Dai H, Zhao L, Li Y, Shu P, Zeng F, Sun L, Liu W, Shen D, Li Q, Liu T, Zhu D, Li X. DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4 [Preprint]. arXiv. 2023. DOI: 10.48550/arXiv.2303.11032
[3] Shusterman R, Waters AC, O’Neill S, Bangs M, Luu P, Tucker DM. An active inference strategy for prompting reliable responses from large language models in medical practice. Npj Digit Med. 2025 Feb 22;8(1):1–10.
[4] Modersohn L, Schulz S, Lohr C, Hahn U. GRASCCO – The First Publicly Shareable, Multiply-Alienated German Clinical Text Corpus. In: German Medical Data Sciences 2022 – Future Medicine: More Precise, More Integrative, More Sustainable! IOS Press; 2022. p. 66–72. DOI: 10.3233/SHTI220805
[5] Lohr C, Matthies F, Faller J, Modersohn L, Riedel A, Hahn U, Kiser R, Boeker M, Meineke F. De-Identifying GRASCCO – A Pilot Study for the De-Identification of the German Medical Text Project (GeMTeX) Corpus. In: German Medical Data Sciences 2024. IOS Press; 2024. p. 171–9. DOI: 10.3233/SHTI240853
[6] Wiest IC, Leßmann ME, Wolf F, Ferber D, Treeck MV, Zhu J, Ebert MP, Westphalen CB, Wermke M, Kather JN. Deidentifying Medical Documents with Local, Privacy-Preserving Large Language Models: The LLM-Anonymizer. NEJM AI. 2025 Mar 27;2(4):AIdbp2400537.