<?xml version="1.0" encoding="iso-8859-1" standalone="no"?>
<!DOCTYPE GmsArticle SYSTEM "http://www.egms.de/dtd/2.0.34/GmsArticle.dtd">
<GmsArticle xmlns:xlink="http://www.w3.org/1999/xlink">
  <MetaData>
    <Identifier>25gmds132</Identifier>
    <IdentifierDoi>10.3205/25gmds132</IdentifierDoi>
    <IdentifierUrn>urn:nbn:de:0183-25gmds1328</IdentifierUrn>
    <ArticleType>Meeting Abstract</ArticleType>
    <TitleGroup>
      <Title language="en">A Secure Interactive and High Performance Processing Environment for Collaborative Machine Learning Tasks on Large Data</Title>
    </TitleGroup>
    <CreatorList>
      <Creator>
        <PersonNames>
          <Lastname>Zaschke</Lastname>
          <LastnameHeading>Zaschke</LastnameHeading>
          <Firstname>Philip</Firstname>
          <Initials>P</Initials>
        </PersonNames>
        <Address>
          <Affiliation>Institut f&#252;r medizinische Informatik, Universit&#228;tsmedizin G&#246;ttingen, Georg August Universit&#228;t G&#246;ttingen, G&#246;ttingen, Germany</Affiliation>
        </Address>
        <Creatorrole corresponding="no" presenting="no">author</Creatorrole>
      </Creator>
      <Creator>
        <PersonNames>
          <Lastname>Richter</Lastname>
          <LastnameHeading>Richter</LastnameHeading>
          <Firstname>Jendrik</Firstname>
          <Initials>J</Initials>
        </PersonNames>
        <Address>
          <Affiliation>Institut f&#252;r medizinische Informatik, Universit&#228;tsmedizin G&#246;ttingen, Georg August Universit&#228;t G&#246;ttingen, G&#246;ttingen, Germany</Affiliation>
        </Address>
        <Creatorrole corresponding="no" presenting="no">author</Creatorrole>
      </Creator>
      <Creator>
        <PersonNames>
          <Lastname>Krefting</Lastname>
          <LastnameHeading>Krefting</LastnameHeading>
          <Firstname>Dagmar</Firstname>
          <Initials>D</Initials>
        </PersonNames>
        <Address>
          <Affiliation>Institut f&#252;r medizinische Informatik, Universit&#228;tsmedizin G&#246;ttingen, Georg August Universit&#228;t G&#246;ttingen, G&#246;ttingen, Germany</Affiliation>
        </Address>
        <Creatorrole corresponding="no" presenting="no">author</Creatorrole>
      </Creator>
    </CreatorList>
    <PublisherList>
      <Publisher>
        <Corporation>
          <Corporatename>German Medical Science GMS Publishing House</Corporatename>
        </Corporation>
        <Address>D&#252;sseldorf</Address>
      </Publisher>
    </PublisherList>
    <SubjectGroup>
      <SubjectheadingDDB>610</SubjectheadingDDB>
      <Keyword language="en">high performance computing</Keyword>
      <Keyword language="en">XNAT</Keyword>
      <Keyword language="en">secure data</Keyword>
      <Keyword language="en">Jupyter Notebook</Keyword>
      <Keyword language="en">biosignals</Keyword>
    </SubjectGroup>
    <DatePublishedList>
      <DatePublished>20251103</DatePublished>
    </DatePublishedList>
    <Language>engl</Language>
    <License license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
      <AltText language="en">This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License.</AltText>
      <AltText language="de">Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung).</AltText>
    </License>
    <SourceGroup>
      <Meeting>
        <MeetingId>M0631</MeetingId>
        <MeetingSequence>132</MeetingSequence>
        <MeetingCorporation>Deutsche Gesellschaft f&#252;r Medizinische Informatik, Biometrie und Epidemiologie</MeetingCorporation>
        <MeetingName>70. Jahrestagung der Deutschen Gesellschaft f&#252;r Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)</MeetingName>
        <MeetingTitle></MeetingTitle>
        <MeetingSession>PS 5: IT-Infrastruktur 1</MeetingSession>
        <MeetingCity>Jena</MeetingCity>
        <MeetingDate>
          <DateFrom>20250907</DateFrom>
          <DateTo>20250911</DateTo>
        </MeetingDate>
      </Meeting>
    </SourceGroup>
    <ArticleNo>Abstr. 315</ArticleNo>
  </MetaData>
  <OrigData>
    <TextBlock name="Text" linked="yes">
      <MainHeadline>Text</MainHeadline><Pgraph><Mark1>Introduction:</Mark1> Research needs data &#8211; however, the potential is not fully exploited, as massive unused amounts are generated in the clinical environment. The Medical Informatics Initiative (MII) provides the governance structures for data access requests, as well as the technical infrastructure for providing data via the data sharing framework <TextLink reference="1"></TextLink>, <TextLink reference="2"></TextLink>. Once data has been provided to researchers, it is up to the user to ensure an appropriate processing environment. </Pgraph><Pgraph>In the the multi-center project Somnolink, large sleep data is planned to be collaboratively used for the prediction of obstructive sleep apnea phenotypes, therapy options and compliance. A secure but collaboratively usable data processing environment must be assured as we plan to train artificial intelligences with broad consent patients and using consent free research data with the german Gesundheitsdatennutzungsgesetz.</Pgraph><Pgraph><Mark1>State of the art:</Mark1> Usually, for machine learning tasks, the analysis pipeline can be defined as the steps (a) data extraction, (b) exploration&#47;curation, (c) preparation of training and (d) training. Step (b) is usually performed interactively (e.g. with Jupyter Notebooks), while step (d) can be handled automatically in large computing environments, for example on a high-performance computing cluster (HPC).</Pgraph><Pgraph>Previously, we integrated HPC into the biomedical research data management system XNAT (Extensible Neuroimaging Archive Toolkit) <TextLink reference="3"></TextLink>, <TextLink reference="4"></TextLink>. This has allowed for batch processing XNAT projects in machine learning tasks on a shared partition, but did not enable interactive and secure data analysis.</Pgraph><Pgraph>Therefore we set up a collaborative processing environment encompassing a research data management system as an interactive data exploration&#47;curation and secure HPC analysis.</Pgraph><Pgraph><Mark1>Concept:</Mark1> We extended the previous XNAT-HPC environment by (i) a Jupyter server in the same network segment of the university compute center&#8217;s (GWDG) cloud and (ii) a secure HPC protocol <TextLink reference="5"></TextLink> again provided by GWDG. The required components &#8211; a containerized adopted Jupyter image and an XNAT plugin &#8211;  are provided by the XNAT community. In XNAT, data can be protected by its access control feature and can be selected to spawn a Jupyter notebook with Python data access. For training, we combined our XNAT-HPC pipeline with the encrypted processing approach secure HPC.</Pgraph><Pgraph><Mark1>Implementation:</Mark1> We implemented and evaluated our workflow and created an exemplary project in XNAT filled with synthesized data. We accessed the files through the Jupyter Notebook and successfully transferred them into the secure HPC pipeline for automatic encryption and secure processing on the HPC system.</Pgraph><Pgraph><Mark1>Lessons learned:</Mark1> We established a secure processing workflow consisting of data exploration using Jupyter Notebooks in XNAT and secure processing by combining our XNAT-HPC pipeline with secure HPC.</Pgraph><Pgraph>It ensures extra security for our training data on the normally shared-HPC environment. While XNAT is optimized for biosignal and image data, its open-source design allows it to support and extend to other data types as well. Limitations of this infrastructure are characterized first by a local temporal storage for creating an encrypted data container within the secure HPC approach. Secondly, this workflow does not cover data uploading into XNAT after its provision to researchers by the MII data management office. Thirdly, while XNAT projects support collaborative access among permitted users via access control, each access of a Jupyter Notebook is limited to the individual user.</Pgraph><Pgraph>The authors declare that they have no competing interests.</Pgraph><Pgraph>The authors declare that an ethics committee vote is not required.</Pgraph></TextBlock>
    <References linked="yes">
      <Reference refNo="1">
        <RefAuthor>Semler SC</RefAuthor>
        <RefAuthor>Boeker M</RefAuthor>
        <RefAuthor>Eils R</RefAuthor>
        <RefAuthor>Krefting D</RefAuthor>
        <RefAuthor>Loeffler M</RefAuthor>
        <RefAuthor>Bussmann J</RefAuthor>
        <RefAuthor></RefAuthor>
        <RefTitle>Die Medizininformatik-Initiative im &#220;berblick &#8211; Aufbau einer Gesundheitsforschungsdateninfrastruktur in Deutschland</RefTitle>
        <RefYear>2024</RefYear>
        <RefJournal>Bundesgesundheitsbl</RefJournal>
        <RefPage>616&#8211;28</RefPage>
        <RefTotal>Semler SC, Boeker M, Eils R, Krefting D, Loeffler M, Bussmann J, et al. Die Medizininformatik-Initiative im &#220;berblick &#8211; Aufbau einer Gesundheitsforschungsdateninfrastruktur in Deutschland. Bundesgesundheitsbl. 2024 Jun 1;67(6):616&#8211;28.</RefTotal>
      </Reference>
      <Reference refNo="2">
        <RefAuthor>Hund H</RefAuthor>
        <RefAuthor>Wettstein R</RefAuthor>
        <RefAuthor>Heidt CM</RefAuthor>
        <RefAuthor>Fegeler C</RefAuthor>
        <RefTitle>Executing Distributed Healthcare and Research Processes &#8211; The HiGHmed Data Sharing Framework</RefTitle>
        <RefYear>2021</RefYear>
        <RefBookTitle>German Medical Data Sciences: Bringing Data to Life.</RefBookTitle>
        <RefPage>126&#8211;33</RefPage>
        <RefTotal>Hund H, Wettstein R, Heidt CM, Fegeler C. Executing Distributed Healthcare and Research Processes &#8211; The HiGHmed Data Sharing Framework. In: German Medical Data Sciences: Bringing Data to Life. IOS Press; 2021. (Studies in Health Technology and Informatics).  p. 126&#8211;33.</RefTotal>
      </Reference>
      <Reference refNo="3">
        <RefAuthor>Marcus DS</RefAuthor>
        <RefAuthor>Olsen TR</RefAuthor>
        <RefAuthor>Ramaratnam M</RefAuthor>
        <RefAuthor>Buckner RL</RefAuthor>
        <RefTitle>The Extensible Neuroimaging Archive Toolkit: an informatics platform for managing, exploring, and sharing neuroimaging data</RefTitle>
        <RefYear>2007</RefYear>
        <RefJournal>Neuroinformatics</RefJournal>
        <RefPage>11&#8211;34</RefPage>
        <RefTotal>Marcus DS, Olsen TR, Ramaratnam M, Buckner RL. The Extensible Neuroimaging Archive Toolkit: an informatics platform for managing, exploring, and sharing neuroimaging data. Neuroinformatics. 2007;5(1):11&#8211;34.</RefTotal>
      </Reference>
      <Reference refNo="4">
        <RefAuthor>Zaschke P</RefAuthor>
        <RefAuthor>Hempel P</RefAuthor>
        <RefAuthor>Bowden J</RefAuthor>
        <RefAuthor>Bender T</RefAuthor>
        <RefAuthor>Han&#223; S</RefAuthor>
        <RefAuthor>Spicher N</RefAuthor>
        <RefAuthor>Krefting D</RefAuthor>
        <RefTitle>Extending the Biosignal and Imaging Data Managing Platform XNAT by High Performance Computing for Reproducible Processing</RefTitle>
        <RefYear>2024</RefYear>
        <RefBookTitle>Gesundheit &#8211; gemeinsam. Kooperationstagung der Deutschen Gesellschaft f&#252;r Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft f&#252;r Sozialmedizin und Pr&#228;vention (DGSMP), Deutschen Gesellschaft f&#252;r Epidemiologie (DGEpi), Deutschen Gesellschaft f&#252;r Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft f&#252;r Public Health (DGPH). Dresden, 08.-13.09.2024</RefBookTitle>
        <RefPage></RefPage>
        <RefTotal>Zaschke P, Hempel P, Bowden J, Bender T, Han&#223; S, Spicher N, Krefting D. Extending the Biosignal and Imaging Data Managing Platform XNAT by High Performance Computing for Reproducible Processing. In: Gesundheit &#8211; gemeinsam. Kooperationstagung der Deutschen Gesellschaft f&#252;r Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft f&#252;r Sozialmedizin und Pr&#228;vention (DGSMP), Deutschen Gesellschaft f&#252;r Epidemiologie (DGEpi), Deutschen Gesellschaft f&#252;r Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft f&#252;r Public Health (DGPH). Dresden, 08.-13.09.2024. D&#252;sseldorf: German Medical Science GMS Publishing House; 2024. DocAbstr. 853. DOI: 10.3205&#47;24gmds113</RefTotal>
        <RefLink>https:&#47;&#47;doi.org&#47;10.3205&#47;24gmds113</RefLink>
      </Reference>
      <Reference refNo="5">
        <RefAuthor>Nolte H</RefAuthor>
        <RefAuthor>Spicher N</RefAuthor>
        <RefAuthor>Russel A</RefAuthor>
        <RefAuthor>Ehlers T</RefAuthor>
        <RefAuthor>Krey S</RefAuthor>
        <RefAuthor>Krefting D</RefAuthor>
        <RefAuthor></RefAuthor>
        <RefTitle>Secure HPC: A workflow providing a secure partition on an HPC system</RefTitle>
        <RefYear>2023</RefYear>
        <RefJournal>Future Generation Computer Systems</RefJournal>
        <RefPage>677&#8211;91</RefPage>
        <RefTotal>Nolte H, Spicher N, Russel A, Ehlers T, Krey S, Krefting D, et al. Secure HPC: A workflow providing a secure partition on an HPC system. Future Generation Computer Systems. 2023 Apr 1;141:677&#8211;91.</RefTotal>
      </Reference>
    </References>
    <Media>
      <Tables>
        <NoOfTables>0</NoOfTables>
      </Tables>
      <Figures>
        <NoOfPictures>0</NoOfPictures>
      </Figures>
      <InlineFigures>
        <NoOfPictures>0</NoOfPictures>
      </InlineFigures>
      <Attachments>
        <NoOfAttachments>0</NoOfAttachments>
      </Attachments>
    </Media>
  </OrigData>
</GmsArticle>