Accessible pipeline for intelligent search on in-house clinical guidelines

25gmds179 10.3205/25gmds179 urn:nbn:de:0183-25gmds1797 Meeting Abstract Accessible pipeline for intelligent search on in-house clinical guidelines Spiegel Spiegel Sören S

Institute for Applied Medical Informatics (IAM), Center for Experimental Medicine, University Hospital Hamburg-Eppendorf (UKE), Hamburg, Germany

author Bellmann Bellmann Louis L

Institute for Applied Medical Informatics (IAM), Center for Experimental Medicine, University Hospital Hamburg-Eppendorf (UKE), Hamburg, Germany

author Solomonova Solomonova Aliona A

UKE Hamburg-Eppendorf, Hamburg, Germany

author Breitfeld Breitfeld Philipp P

UKE Hamburg-Eppendorf, Hamburg, Germany

author German Medical Science GMS Publishing House

Düsseldorf

610 large language model (LLM) retrieval-augmented generation (RAG) 20251103 engl This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). M0631 179 Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie 70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS) PS 11: Wissens- und Prozessmanagement Jena 20250907 20250911 Abstr. 214 TextIntroduction: Premedication in anesthesia requires clinicians to follow an evolving set of clinical guidelines. These documents are often lengthy and stored across multiple documents, making quick access to relevant information during clinical workflows challenging. Existing retrieval-augmented generation (RAG) systems using large language models provide a promising foundation for intelligent document querying . However, standard methods for splitting documents into search units may not be well-suited for the structure and complexity of clinical guidelines. While small chunks can improve retrieval precision, they often lack the broader context required for coherent and reliable answers . To address these limitations, we introduce NicerSlicer to slice large documents into semantically coherent sections. We also present MedQueryGuide , an interactive frontend for intelligent clinical guideline search. Methods: NicerSlicer generates section splits, which users can refine by discarding, joining, splitting or adjusting sections. The sections can be downloaded and integrated into RAG pipelines. MedQueryGuide is an interactive frontend for RAG-based question answering, featuring: a vector store with metadata filtering for targeted search across multiple documents,a recursive retriever returning small content units along with their broader context sections to support more informative answers,a user feedback function (“helpful” vs. “not helpful”) to iteratively improve search quality and content relevance. We quantitatively evaluated the RAG pipeline using 100 automatically generated question answer pairs derived from four clinical guidelines (127 pages). Of these, 70 questions required information from a single section (single-hop), while 30 involved reasoning across two distinct sections (multi-hop). Retrieval is evaluated by using the hit rate and mean reciprocal rank (MRR), while the answer quality was evaluated against reference answers using BERTScore . Additionally, one anesthetist assessed 10 real-world questions in MedQueryGuide using its feedback system to explore whether autogenerated evaluation aligns with real-world scenarios. Results: For single-hop questions we achieved a hit rate of 0.87 and a MRR of 0.82. BERTScore of 0.79 indicates high degree of semantic similarity with reference answers. Our approach achieved a hit rate of 0.92 for at least one of the documents for multi-hop questions, however, when considering both reference documents the hit rate dropped to 0.54. Moreover, the BERTScore decreased to 0.75. This decline is reflected in the real-world evaluation: all five simple queries led to helpful documents and answers, while the five more complex or vague queries produced less relevant retrievals and unhelpful answers. Of 20 documents retrieved for five complex queries, only 10 were rated as helpful by the physician. Discussion: A key challenge lies in working with PDFs, common extraction methods often fall short and even state of the art visual language models can introduce subtle errors. The real-world evaluation, though limited in size, reflected the performance gap between simple and complex queries. As a next step, we plan to fine-tune retrieval using synthetic data to improve performance on complex queries.Conclusion: Our results demonstrate the importance of context-aware retrieval and flexible document segmentation when building intelligent search systems for clinical guidelines. Tools like NicerSlicer and MedQueryGuide show that tailored solutions can meaningfully support clinical workflows.The authors declare that they have no competing interests.The authors declare that an ethics committee vote is not required. Ng KKY Matsuba I Zhang PC RAG in Health Care: A Novel Framework for Improving Communication and Decision-Making by Addressing LLM Limitations 2025 NEJM AI AIra2400380 Ng KKY, Matsuba I, Zhang PC. RAG in Health Care: A Novel Framework for Improving Communication and Decision-Making by Addressing LLM Limitations. NEJM AI. 2025 Jan;2(1):AIra2400380. Bhat SR Rudat M Spiekermann J Flores-Herr N Rethinking Chunk Size For Long-Document Retrieval: A Multi-Dataset Analysis [Preprint] 2025 arXiv Bhat SR, Rudat M, Spiekermann J, Flores-Herr N. Rethinking Chunk Size For Long-Document Retrieval: A Multi-Dataset Analysis [Preprint]. arXiv. 2025. DOI: 10.48550/arXiv.2505.21700 http://dx.doi.org/10.48550/arXiv.2505.21700 Anonym UKEIAM/NicerSlicer: Repository to create nicely sliced PDFs for your RAG UKEIAM/NicerSlicer: Repository to create nicely sliced PDFs for your RAG. GitHub; [cited 2025 Jun 20]. Available from: https://github.com/UKEIAM/NicerSlicer/tree/main https://github.com/UKEIAM/NicerSlicer/tree/main Anonym IAMspiegel/MedQueryGuide: RAG application for medical guidelines IAMspiegel/MedQueryGuide: RAG application for medical guidelines. GitHub; [cited 2025 Jun 20]. Available from: https://github.com/IAMspiegel/MedQueryGuide https://github.com/IAMspiegel/MedQueryGuide Zhang T Kishore V Wu F Weinberger KQ Artzi Y BERTScore: Evaluating Text Generation with BERT International Conference on Learning Representations (ICLR) 2020 Zhang T, Kishore V, Wu F, Weinberger KQ, Artzi Y. BERTScore: Evaluating Text Generation with BERT. In: International Conference on Learning Representations (ICLR) 2020. Available from: https://openreview.net/forum?id=SkeHuCVFDr https://openreview.net/forum?id=SkeHuCVFDr 0 0 0 0