Logo

German Congress of Orthopaedics and Traumatology (DKOU 2025)

Deutsche Gesellschaft für Orthopädie und Unfallchirurgie (DGOU), Deutsche Gesellschaft für Orthopädie und Orthopädische Chirurgie (DGOOC), Deutsche Gesellschaft für Unfallchirurgie (DGU), Berufsverband für Orthopädie und Unfallchirurgie (BVOU)
28.-31.10.2025
Berlin


Meeting Abstract

Interdisciplinary data annotation: Best practices for enhancing accuracy in medical large language models

Paulina Seidl 1
Florian Hinterwimmer 1
Sebastian Breden 1
Rüdiger von Eisenhart-Rothe 1
Daniel Rueckert 1
Carolin Mogler 1
Peter Schüffler 1
Igor Lazic 1
Márton Szép 1
1Technical University of Munich, TUM, Klinikum rechts der Isar, Munich, Deutschland

Text

Objectives and questions: Data-viewing is an essential but time-consuming part of the clinical workflow. The application of artificial intelligence for data filtering can result in a significant time reduction. However, untrained models lack the accuracy required for medical data evaluation, necessitating annotation and specialized training. This study aims to optimize annotation processes and enhance collaboration between clinicians and computer scientists to develop large language models that align with clinical needs and improve workflow efficiency.

Material and methods: For this retrospective single-center study 3,108 pathological reports of bone and soft tissue tumors from 2003 to 2020 were annotated. The reports were digitalized by the pathological department and structured by computer scientists of our research group. An expert panel of clinicians and scientists defined key parameters. A visual annotation tool was used for preprocessing and labeling, focusing on kind of intervention, dignity, entity, sub-entity, localization, size, resection margin, grading, regression status, TNM classification, personal information and report quality. Quality was classified as good, medium, or poor based on assignability to the key parameters. A clinician marked the parameters with colors in the text and entered values in predefined fields for each report, creating a dataset for information extraction and retrieval.

Results: 2,553 (82.1%) reports demonstrated good or medium quality and were considered for further analysis. 555 reports were of poor quality. Among these, 193 did not pertain to the subject of interest, 97 could not be assigned to an entity and 61 could not be classified by dignity. 198 reports were not assignable to a kind of intervention. 80 reports showed two main entities. Interestingly, aneurysmal bone cyst co-occurred with giant-cell tumor and chondroblastoma in 26 and 10 cases, respectively.

Discussion and conclusions: Medical data annotation requires a balance of detail, clarity, and feasibility. While a highly detailed approach may seem advantageous, prioritizing essential information, standardized terminology, and precise labeling is crucial. Identifying key text passages for each parameter and consolidating entities into broader categories with sub-entities enhances consistency and efficiency. In complex reports, the primary procedure should take precedence for consistency. Annotators must strictly adhere to explicit report content, regardless of contextual knowledge from related cases. Establishing annotation guidelines through an interdisciplinary panel ensures high-quality, standardized data for medical artificial intelligence applications.

Table 1 [Tab. 1]

Table 1

Figure 1 [Abb. 1]

Figure 1