Generating High-Quality Multiple-Choice Questions Using Small Language Models and Adaptive Agentic Infrastructure

25gmds020 10.3205/25gmds020 urn:nbn:de:0183-25gmds0203 Meeting Abstract Generating High-Quality Multiple-Choice Questions Using Small Language Models and Adaptive Agentic Infrastructure Größler Größler Michael M

University Medical Center Hamburg-Eppendorf, Institute for Applied Medical Informatics, Hamburg, Germany

author Düsterbeck Düsterbeck Lilly Marie LM

University Medical Center Hamburg-Eppendorf, Institute for Applied Medical Informatics, Hamburg, Germany

author Credidio Credidio Graziella G

University Medical Center Hamburg-Eppendorf, Institute for Applied Medical Informatics, Hamburg, Germany

author Riemann Riemann Layla Tabea LT

University Medical Center Hamburg-Eppendorf, Institute for Applied Medical Informatics, Hamburg, Germany

author German Medical Science GMS Publishing House

Düsseldorf

610 AI-based MCQ generation multi-agent system small language models personalized learning platform agent graph optimization 20251103 engl This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). M0631 020 Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie 70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS) V: Education and science communication Jena 20250907 20250911 Abstr. 111 TextIntroduction: We introduce a novel approach for generating high-quality multiple-choice questions (MCQs) within KiMED, an AI-based learning platform designed to support medical students in reviewing lecture material and preparing for coursework. KiMED is a personalized learning web application that dynamically adjusts question difficulty, topical emphasis, and explanation depth based on individual student performance and curricular context. A key challenge in this domain lies in balancing strict legal and quality requirements with limited computational resources . Erroneous material could potentially lead to lawsuits, thus demanding high-quality question generation, while computational limitations arise from the substantial demands to deploy language models locally. Scaling such services to support simultaneous use by many students further intensifies these demands. Our approach addresses these challenges using small language models within a highly adaptive agentic framework. Focusing initially on biochemistry, our approach lays the groundwork for expanding into other subjects and scaling personalized learning tools in resource-constrained educational environments.Methods: Our system employs the GPTSwarm framework on top of a custom RAG Pipeline , for document retrieval, modeling language agents as directed acyclic graphs with functional nodes. Each node handles tasks like concept extraction, distractor generation, or answer validation, while edges manage information flow within and between agents. Agents form a swarm, with both node prompts and inter-agent communication patterns optimized automatically. Optimization occurs at two levels: node optimization refines prompt instructions, and edge optimization adjusts inter-agent information sharing. These processes are guided by reinforcement learning and task-specific feedback. The framework operates effectively with small language models to suit our available resources. The initial data are based on a combination of textbooks, slides, transcripts of those slides, and course scripts. To assess our approach, we created a biochemistry MCQ dataset comprising three sets of 50 questions: AI-generated using only prompt engineering as a baseline, AI-generated with our framework, and human-generated. Two domain experts evaluated each question using binary scores across ten criteria, including clarity, relevance, grammatical correctness, and distractor quality . Scores were averaged per criterion and over all experts, and summed to yield a final score ranging from 0 to 10.Results: The baseline AI-generated questions achieved a final score of 4.7, establishing a lower benchmark. Human-authored questions scored 8.9. Our enhanced AI pipeline attained a score of 8.7. Further, it demonstrated improved alignment with criteria such as topic centrality and relevance to learning objectives compared to the human generated. However, generating suitable distractors remained more difficult for the AI system. These results indicate that AI-generated MCQs in biochemistry can effectively support educators in developing high-quality questions.Conclusion: This work shows the potential of a multi-agent, graph-optimized approach to automated MCQ generation in medical education. By using small language models within the GPTSwarm framework, we enable efficient and adaptive MCQ generation for personalized learning. Our code for the platform and the agent-based optimization is going to be open-source to allow the expansion to other topics and medical faculties.The authors declare that they have no competing interests.The authors declare that an ethics committee vote is not required. Ali F Talat H AI Integration in MCQ Development: Assessing Quality in Medical Education: A Systematic Review 2024 L&S 14 Ali F, Talat H. AI Integration in MCQ Development: Assessing Quality in Medical Education: A Systematic Review. L&S. 2024;5(3):14. DOI: 10.37185/LnS.1.1.643 http://dx.doi.org/10.37185/LnS.1.1.643 Zhuge M Wang W Kirsch L Faccio F Khizbullin D Schmidhuber J PTSwarm: Language Agents as Optimizable Graphs [Preprint] 2024 arXiv Zhuge M, Wang W, Kirsch L, Faccio F, Khizbullin D, Schmidhuber J. GPTSwarm: Language Agents as Optimizable Graphs [Preprint]. arXiv. 2024. DOI: 10.48550/arXiv.2402.16823 http://dx.doi.org/10.48550/arXiv.2402.16823 Wu F Li Z Wei F Li Y Ding B Gao J Talk to Right Specialists: Routing and Planning in Multi-agent System for Question Answering [Preprint] 2025 arXiv Wu F, Li Z, Wei F, Li Y, Ding B, Gao J. Talk to Right Specialists: Routing and Planning in Multi-agent System for Question Answering [Preprint]. arXiv. 2025. DOI: 10.48550/arXiv.2501.07813 http://dx.doi.org/10.48550/arXiv.2501.07813 Gao Y Xiong Y Gao X Jia K Pan J Bi Y Retrieval-Augmented Generation for Large Language Models: A Survey [Preprint] 2024 arXiv Gao Y, Xiong Y, Gao X, Jia K, Pan J, Bi Y, et al. Retrieval-Augmented Generation for Large Language Models: A Survey [Preprint]. arXiv. 2024. DOI: 10.48550/arXiv.2312.10997 https://doi.org/10.48550/arXiv.2312.10997 Wang J Xiao R Tseng YJ Generating AI Literacy MCQs: A Multi-Agent LLM Approach 2025 Proceedings of the 56th ACM Technical Symposium on Computer Science Education V 2 1651–2 Wang J, Xiao R, Tseng YJ. Generating AI Literacy MCQs: A Multi-Agent LLM Approach. In: Proceedings of the 56th ACM Technical Symposium on Computer Science Education V 2. 2025. p. 1651–2. DOI: 10.1145/3641555.3705189 https://doi.org/10.1145/3641555.3705189 0 0 0 0