<?xml version="1.0" encoding="iso-8859-1" standalone="no"?>
<!DOCTYPE GmsArticle SYSTEM "http://www.egms.de/dtd/2.0.34/GmsArticle.dtd">
<GmsArticle xmlns:xlink="http://www.w3.org/1999/xlink">
  <MetaData>
    <Identifier>25dkou468</Identifier>
    <IdentifierDoi>10.3205/25dkou468</IdentifierDoi>
    <IdentifierUrn>urn:nbn:de:0183-25dkou4685</IdentifierUrn>
    <ArticleType>Meeting Abstract</ArticleType>
    <TitleGroup>
      <Title language="en">Challenges and limitations of large language models in medical information extraction</Title>
    </TitleGroup>
    <CreatorList>
      <Creator>
        <PersonNames>
          <Lastname>Sz&#233;p</Lastname>
          <LastnameHeading>Sz&#233;p</LastnameHeading>
          <Firstname>M&#225;rton</Firstname>
          <Initials>M</Initials>
        </PersonNames>
        <Address>
          <Affiliation>Technical University of Munich, TUM University Hospital, Klinikum rechts der Isar, Munich, Deutschland</Affiliation>
        </Address>
        <Creatorrole corresponding="no" presenting="yes">author</Creatorrole>
      </Creator>
      <Creator>
        <PersonNames>
          <Lastname>Hinterwimmer</Lastname>
          <LastnameHeading>Hinterwimmer</LastnameHeading>
          <Firstname>Florian</Firstname>
          <Initials>F</Initials>
        </PersonNames>
        <Address>
          <Affiliation>Technical University of Munich, TUM University Hospital, Klinikum rechts der Isar, Munich, Deutschland</Affiliation>
        </Address>
        <Creatorrole corresponding="no" presenting="no">author</Creatorrole>
      </Creator>
      <Creator>
        <PersonNames>
          <Lastname>Breden</Lastname>
          <LastnameHeading>Breden</LastnameHeading>
          <Firstname>Sebastian</Firstname>
          <Initials>S</Initials>
        </PersonNames>
        <Address>
          <Affiliation>Technical University of Munich, TUM University Hospital, Klinikum rechts der Isar, Munich, Deutschland</Affiliation>
        </Address>
        <Creatorrole corresponding="no" presenting="no">author</Creatorrole>
      </Creator>
      <Creator>
        <PersonNames>
          <Lastname>von Eisenhart-Rothe</Lastname>
          <LastnameHeading>von Eisenhart-Rothe</LastnameHeading>
          <Firstname>R&#252;diger</Firstname>
          <Initials>R</Initials>
        </PersonNames>
        <Address>
          <Affiliation>Technical University of Munich, TUM University Hospital, Klinikum rechts der Isar, Munich, Deutschland</Affiliation>
        </Address>
        <Creatorrole corresponding="no" presenting="no">author</Creatorrole>
      </Creator>
      <Creator>
        <PersonNames>
          <Lastname>Rueckert</Lastname>
          <LastnameHeading>Rueckert</LastnameHeading>
          <Firstname>Daniel</Firstname>
          <Initials>D</Initials>
        </PersonNames>
        <Address>
          <Affiliation>Technical University of Munich, TUM University Hospital, Klinikum rechts der Isar, Munich, Deutschland</Affiliation>
        </Address>
        <Creatorrole corresponding="no" presenting="no">author</Creatorrole>
      </Creator>
      <Creator>
        <PersonNames>
          <Lastname>Mogler</Lastname>
          <LastnameHeading>Mogler</LastnameHeading>
          <Firstname>Carolin</Firstname>
          <Initials>C</Initials>
        </PersonNames>
        <Address>
          <Affiliation>Technical University of Munich, TUM University Hospital, Klinikum rechts der Isar, Munich, Deutschland</Affiliation>
        </Address>
        <Creatorrole corresponding="no" presenting="no">author</Creatorrole>
      </Creator>
      <Creator>
        <PersonNames>
          <Lastname>Sch&#252;ffler</Lastname>
          <LastnameHeading>Sch&#252;ffler</LastnameHeading>
          <Firstname>Peter</Firstname>
          <Initials>P</Initials>
        </PersonNames>
        <Address>
          <Affiliation>Technical University of Munich, TUM University Hospital, Klinikum rechts der Isar, Munich, Deutschland</Affiliation>
        </Address>
        <Creatorrole corresponding="no" presenting="no">author</Creatorrole>
      </Creator>
      <Creator>
        <PersonNames>
          <Lastname>Lazic</Lastname>
          <LastnameHeading>Lazic</LastnameHeading>
          <Firstname>Igor</Firstname>
          <Initials>I</Initials>
        </PersonNames>
        <Address>
          <Affiliation>Technical University of Munich, TUM University Hospital, Klinikum rechts der Isar, Munich, Deutschland</Affiliation>
        </Address>
        <Creatorrole corresponding="no" presenting="no">author</Creatorrole>
      </Creator>
      <Creator>
        <PersonNames>
          <Lastname>Seidl</Lastname>
          <LastnameHeading>Seidl</LastnameHeading>
          <Firstname>Paulina</Firstname>
          <Initials>P</Initials>
        </PersonNames>
        <Address>
          <Affiliation>Technical University of Munich, TUM University Hospital, Klinikum rechts der Isar, Munich, Deutschland</Affiliation>
        </Address>
        <Creatorrole corresponding="no" presenting="no">author</Creatorrole>
      </Creator>
    </CreatorList>
    <PublisherList>
      <Publisher>
        <Corporation>
          <Corporatename>German Medical Science GMS Publishing House</Corporatename>
        </Corporation>
        <Address>D&#252;sseldorf</Address>
      </Publisher>
    </PublisherList>
    <SubjectGroup>
      <SubjectheadingDDB>610</SubjectheadingDDB>
    </SubjectGroup>
    <DatePublishedList>
      <DatePublished>20251031</DatePublished>
    </DatePublishedList>
    <Language>engl</Language>
    <License license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
      <AltText language="en">This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License.</AltText>
      <AltText language="de">Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung).</AltText>
    </License>
    <SourceGroup>
      <Meeting>
        <MeetingId>M0634</MeetingId>
        <MeetingSequence>468</MeetingSequence>
        <MeetingCorporation>Deutsche Gesellschaft f&#252;r Orthop&#228;die und Unfallchirurgie</MeetingCorporation>
        <MeetingCorporation>Deutsche Gesellschaft f&#252;r Orthop&#228;die und Orthop&#228;dische Chirurgie</MeetingCorporation>
        <MeetingCorporation>Deutsche Gesellschaft f&#252;r Unfallchirurgie</MeetingCorporation>
        <MeetingCorporation>Berufsverband f&#252;r Orthop&#228;die und Unfallchirurgie</MeetingCorporation>
        <MeetingName></MeetingName>
        <MeetingTitle>Deutscher Kongress f&#252;r Orthop&#228;die und Unfallchirurgie (DKOU 2025)</MeetingTitle>
        <MeetingSession>Abstracts &#124; Digitalisierung 2</MeetingSession>
        <MeetingCity>Berlin</MeetingCity>
        <MeetingDate>
          <DateFrom>20251028</DateFrom>
          <DateTo>20251031</DateTo>
        </MeetingDate>
      </Meeting>
    </SourceGroup>
    <ArticleNo>AB74-4136</ArticleNo>
  </MetaData>
  <OrigData>
    <TextBlock name="Text" linked="yes">
      <MainHeadline>Text</MainHeadline><Pgraph><Mark1>Objectives: </Mark1>Large Language Models (LLMs) have demonstrated remarkable capabilities in general Natural Language Processing (NLP) tasks, sparking interest in their healthcare applications. However, their effectiveness in extracting complex, domain-specific medical information remains uncertain, despite their enormous potential to accelerate clinical workflows (Figure 1 <ImgLink imgNo="1" imgType="figure" />). This study evaluates the ability of LLMs to extract key details from clinical records on bone and soft tissue tumors. Our primary research question is: To what extent can LLMs accurately perform medical information extraction tasks without additional training, and how do their limitations impact usability in real-world clinical settings&#63;</Pgraph><Pgraph><Mark1>Methods: </Mark1>We evaluated the performance of state-of-the-art open-weight LLMs on a retrospective, single-center dataset of annotated clinical reports. These contained essential tumor-related information, including entity, dignity, location, size, grading, and TNM classification. We created a detailed task description and constrained models to produce structured outputs to improve relevance and measurability. We evaluated LLMs in both zero- and few-shot settings, assessing the correctness and appropriateness of the extracted details to determine their reliability for clinical applications.</Pgraph><Pgraph><Mark1>Results: </Mark1>LLMs demonstrated reasonable performance in extracting general medical information, such as tumor dignity and location, with &#126;90&#37; F1-score (equal balance of sensitivity and precision, see Table 1 <ImgLink imgNo="1" imgType="table" />). However, this is still inadequate for safe and effective use in clinical settings, especially for domain-specific critical parameters such as tumor grading, resection margins, size, and TNM classification with even lower scores. Combining the detailed task description with example-based prompts further improved performance. These findings indicate that despite the success of LLMs in general NLP tasks, their ability to process intricate medical details remains limited without domain-specific adaptation.</Pgraph><Pgraph><Mark1>Discussion and conclusions: </Mark1>Our study highlights a fundamental gap between the perceived capabilities of LLMs and their actual performance in medical information extraction. While these models can assist with structured data extraction, their reliability diminishes for nuanced, clinically significant details. Our findings suggest a thorough description of required information integrated with few-shot examples is crucial for enhancing generalization across diverse scenarios. We underscore the necessity of close collaboration between computer scientists and clinicians to define task scopes and structure extraction requirements effectively. Future research should explore lightweight fine-tuning strategies tailored to specific medical subdomains to enhance LLM performance and ensure their practical utility in clinical workflows. For this, high-quality annotations provided by experienced clinicians are indispensable.</Pgraph></TextBlock>
    <Media>
      <Tables>
        <Table format="png">
          <MediaNo>1</MediaNo>
          <MediaID>1</MediaID>
          <Caption><Pgraph><Mark1>Table 1: Evaluation results for Llama 3.3 across different tumor-related information. </Mark1></Pgraph></Caption>
        </Table>
        <NoOfTables>1</NoOfTables>
      </Tables>
      <Figures>
        <Figure width="797" height="539" format="png">
          <MediaNo>1</MediaNo>
          <MediaID>1</MediaID>
          <Caption><Pgraph><Mark1>Figure 1</Mark1></Pgraph></Caption>
        </Figure>
        <NoOfPictures>1</NoOfPictures>
      </Figures>
      <InlineFigures>
        <NoOfPictures>0</NoOfPictures>
      </InlineFigures>
      <Attachments>
        <NoOfAttachments>0</NoOfAttachments>
      </Attachments>
    </Media>
  </OrigData>
</GmsArticle>