70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.
70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V.
KiMED – a Generative Artificial Intelligence Tool to Support the Individualized Learning Process of Medical Students
2University Medical Center Hamburg-Eppendorf (UKE), Department of Biochemistry and Molecular Cell Biology, Hamburg, Germany
3University Medical Center Hamburg-Eppendorf (UKE), Dean’s Office for Education and Students’ Affairs, Hamburg, Germany
Text
Introduction: Large Language Models (LLMs) are transforming medical education, influencing teaching methodologies [1], assisting medical writing [2], and personalizing learning material [3], [4], among others. Techniques like Retrieval-Augmented Generation (RAG) have gained attention, enabling not only explainability but also grounding responses. While LLM-based platforms for education exist, there are no open-source, flexible, and scalable tools tailored to a German university curriculum.
Our RAG-powered tool, KiMED, will soon provide individualized learning material and progress reports, bridging knowledge gaps of prerequisite subjects relevant to medical studies. Initially, students assess their knowledge with a test on a selected subject. Topics where they showed weaker performance are prioritized, guiding generation of learning plans. KiMED also supports exam preparation with a curated pool of multiple-choice questions (MCQs) with explanations on alternatives. First evaluations of the prototype regarding KiMED’s relevance and usability are depicted.
Methods: Internally crafted slides, texts, and license-compliant textbooks sections were incorporated in the pipeline. These were chunked by title, metadata mapped, and a hybrid retrieval system (semantic matching and keyword search) was implemented. For data privacy reasons and German fluency, a compact multilingual LLM running locally was selected. RAGAS framework [5], prompt engineering, in-line source citation, metadata filtering, human evaluation, were used to assess and improve retrieval and generation quality.
The system is scalable for multiple simultaneous accesses, being hosted by servers of the University of Hamburg connected to local servers running the database and LLM serving engine.
A prototype tool for biochemistry is currently under evaluation. Final improvements are being guided by a System Usability Scale (SUS) [6] along with qualitative feedback within the tool. Preliminary evaluation was performed by two biochemistry experts, three medical students, one natural sciences master’s student, and four AI specialists.
Results: The model Qwen2.5-7B-Instruct [7] was used for content generation. A multi-prompting approach coupled with agents and structured outputs ensured pre-established quality criteria were followed.
Final SUS scores ranged from 72.5 to 87.5. All ten evaluators strongly agreed they would use the tool more frequently and eight expressed strong confidence using it. Three were neutral regarding success of features integration. Qualitative feedback suggested color-blind accessibility, changes in navigability and button placement.
Discussion: Although small evaluators sample size, our first results indicate that KiMED is intuitive and user-friendly, needing minor adjustments. Assessment of the tool’s usability by a broader cohort of medical students and experts will be carried out.
Content correctness is challenging in LLM-based systems. So far, human-in-the-loop as final quality assurance ensures educational and legal compliance. To improve quality of generated content before human review, we plan to apply more robust hallucination detection frameworks and a self-optimizing multi-agentic approach for MCQs generation. To enhance learning outcomes, next steps include incorporation of chatbot and spaced repetition.
Conclusion: KiMED will further extend to additional medical topics at our University Medical Center and, as an open-source platform, can be adapted by other faculties, thus enhancing personalized learning on bigger scopes and help reshaping medical education.
Acknowledgements: AHG and LTR gratefully acknowledge funding by Claussen-Simon-Stiftung, Hamburg.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.
References
[1] Grunhut J, Marques O, Wyatt A. Needs, Challenges, and Applications of Artificial Intelligence in Medical Education Curriculum. JMIR Med Educ. 2022;8(2):e35587. DOI: 10.2196/35587[2] Ali SR, Dobbs TD, Hutchings HA, Whitaker IS. Using ChatGPT to write patient clinic letters. Lancet Digit Health. 2023;5(4):e179–81. DOI: 10.1016/S2589-7500(23)00048-1
[3] Abd-alrazaq A, AlSaad R, Alhuwail D, Ahmed A, Healy PM, Latifi S, et al. Large Language Models in Medical Education: Opportunities, Challenges, and Future Directions. JMIR Med Educ. 2023;9:e48291. DOI: 10.2196/48291
[4] Anki - powerful, intelligent flashcards. [cited 2025 Apr 18]. Available from: https://apps.ankiweb.net/
[5] Es S, James J, Espinosa-Anke L, Schockaert S. RAGAS: Automated Evaluation of Retrieval Augmented Generation [Preprint]. arXiv. 2023 Jan 1. DOI: 10.48550/arXiv.2309.15217
[6] Brooke J. SUS: A “Quick and Dirty” Usability Scale. In: Usability Evaluation in Industry. London: CRC Press; 1996.
[7] Qwen, Yang A, Yang B, Zhang B, Hui B, Zheng B, et al. Qwen2.5 Technical Report [Preprint]. arXiv. 2025. DOI: 10.48550/arXiv.2412.15115



