<?xml version="1.0" encoding="iso-8859-1" standalone="no"?>
<!DOCTYPE GmsArticle SYSTEM "http://www.egms.de/dtd/2.0.34/GmsArticle.dtd">
<GmsArticle xmlns:xlink="http://www.w3.org/1999/xlink">
  <MetaData>
    <Identifier>25gmds068</Identifier>
    <IdentifierDoi>10.3205/25gmds068</IdentifierDoi>
    <IdentifierUrn>urn:nbn:de:0183-25gmds0689</IdentifierUrn>
    <ArticleType>Meeting Abstract</ArticleType>
    <TitleGroup>
      <Title language="en">Oblique splits in artificial representative trees for random forests</Title>
    </TitleGroup>
    <CreatorList>
      <Creator>
        <PersonNames>
          <Lastname>Laabs</Lastname>
          <LastnameHeading>Laabs</LastnameHeading>
          <Firstname>Bj&#246;rn-Hergen</Firstname>
          <Initials>BH</Initials>
        </PersonNames>
        <Address>
          <Affiliation>Institute for Medical Biometry and Statistics, University of L&#252;beck, L&#252;beck, Germany</Affiliation>
        </Address>
        <Creatorrole corresponding="no" presenting="no">author</Creatorrole>
      </Creator>
      <Creator>
        <PersonNames>
          <Lastname>Bakare</Lastname>
          <LastnameHeading>Bakare</LastnameHeading>
          <Firstname>Janet</Firstname>
          <Initials>J</Initials>
        </PersonNames>
        <Address>
          <Affiliation>Institute for Medical Biometry and Statistics, University of L&#252;beck, L&#252;beck, Germany</Affiliation>
        </Address>
        <Creatorrole corresponding="no" presenting="no">author</Creatorrole>
      </Creator>
      <Creator>
        <PersonNames>
          <Lastname>Szymczak</Lastname>
          <LastnameHeading>Szymczak</LastnameHeading>
          <Firstname>Silke</Firstname>
          <Initials>S</Initials>
        </PersonNames>
        <Address>
          <Affiliation>Institute for Medical Biometry and Statistics, University of L&#252;beck, L&#252;beck, Germany</Affiliation>
        </Address>
        <Creatorrole corresponding="no" presenting="no">author</Creatorrole>
      </Creator>
    </CreatorList>
    <PublisherList>
      <Publisher>
        <Corporation>
          <Corporatename>German Medical Science GMS Publishing House</Corporatename>
        </Corporation>
        <Address>D&#252;sseldorf</Address>
      </Publisher>
    </PublisherList>
    <SubjectGroup>
      <SubjectheadingDDB>610</SubjectheadingDDB>
      <Keyword language="en">random forest</Keyword>
      <Keyword language="en">artifical representative trees</Keyword>
      <Keyword language="en">oblique random forest</Keyword>
      <Keyword language="en">explainable artificial intelligence</Keyword>
      <Keyword language="en">most representative trees</Keyword>
    </SubjectGroup>
    <DatePublishedList>
      <DatePublished>20251103</DatePublished>
    </DatePublishedList>
    <Language>engl</Language>
    <License license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
      <AltText language="en">This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License.</AltText>
      <AltText language="de">Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung).</AltText>
    </License>
    <SourceGroup>
      <Meeting>
        <MeetingId>M0631</MeetingId>
        <MeetingSequence>068</MeetingSequence>
        <MeetingCorporation>Deutsche Gesellschaft f&#252;r Medizinische Informatik, Biometrie und Epidemiologie</MeetingCorporation>
        <MeetingName>70. Jahrestagung der Deutschen Gesellschaft f&#252;r Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)</MeetingName>
        <MeetingTitle></MeetingTitle>
        <MeetingSession>V: Machine learning and AI applications 1</MeetingSession>
        <MeetingCity>Jena</MeetingCity>
        <MeetingDate>
          <DateFrom>20250907</DateFrom>
          <DateTo>20250911</DateTo>
        </MeetingDate>
      </Meeting>
    </SourceGroup>
    <ArticleNo>Abstr. 352</ArticleNo>
  </MetaData>
  <OrigData>
    <TextBlock name="Text" linked="yes">
      <MainHeadline>Text</MainHeadline><Pgraph><Mark1>Introduction:</Mark1> Artificial representative trees (ARTs) are an interpretation method for random forests where a single binary decision tree is generated as a surrogate model to be interpreted instead of the whole ensemble <TextLink reference="1"></TextLink>. While random forests, as an ensemble of binary decision trees, can approximate linear relationships quite well, ARTs face challenges due to the necessity of numerous binary splits, leading to excessive complexity and difficulty in interpretation. Therefore, we propose the alternative method of oblique ARTs (oARTs), utilizing oblique splits (e.g., linear combinations of predictor variables) to more accurately approximate linear relationships following the concept of oblique random forests <TextLink reference="2"></TextLink>.</Pgraph><Pgraph>Recent works have suggested the use of tree-based surrogates such as most representative trees (MRTs) <TextLink reference="3"></TextLink>, <TextLink reference="4"></TextLink> or small ensembles of trees <TextLink reference="5"></TextLink>, <TextLink reference="6"></TextLink>. Moreover, oblique surrogate trees have shown superior fidelity by using linear combinations at splits <TextLink reference="7"></TextLink>. By oARTs, we build on these advances by combining variables based on LDA and constructing an interpretable oblique tree.</Pgraph><Pgraph><Mark1>Methods:</Mark1> In oARTs, we first use linear discriminant analysis (LDA) to identify and prioritize linear combinations of variables that are important for the prediction task. Subsequently, we extend the training data set for the generation of an artificial representative tree by synthetic variables based on the LDA results. Finally, an ART is generated as previously described.</Pgraph><Pgraph>In an extensive simulation study, we generated separate training and testing data sets for three different scenarios, linear and non-linear relationships with the outcome. We compared the new method of oARTs with classical ARTs and MRTs. Our main performance measures were the prediction accuracy on new data (accuracy), the similarity of prediction to the original forest prediction (fidelity), the fraction of included effect and noise variables (coverage), run time, and size of the resulting models. </Pgraph><Pgraph><Mark1>Results:</Mark1> With regard to fidelity, classical ARTs perform best in settings where only a few effect variables influence the outcome. For settings with more effect variables, oARTs have better fidelity. Concerning accuracy, oARTs demonstrate optimal performance and are very close and in some cases even better than the performance of the original random forest. Given that oARTs can include multiple variables in each split, they also show the highest fraction of included effect variables while keeping the fraction of included noise variables to a minimum. Finally, they lead to the smallest models with a similar run time to classical ARTs.</Pgraph><Pgraph><Mark1>Discussion:</Mark1> Our new method of oARTs is superior to ARTs and MRTs in data sets with linear relationships and leads to comparable results in the absence of any linear dependencies between predictor variables. An implementation of oARTs is available in our R package timbR (<Hyperlink href="https:&#47;&#47;github.com&#47;imbs-hl&#47;timbR">https:&#47;&#47;github.com&#47;imbs-hl&#47;timbR</Hyperlink>).</Pgraph><Pgraph>The authors declare that they have no competing interests.</Pgraph><Pgraph>The authors declare that an ethics committee vote is not required.</Pgraph><Pgraph>Parts of this work have been presented before at the 7th Joint Statistical Meeting of the Deutsche Arbeitsgemeinschaft Statistik (DAGStat 2025) in Berlin.</Pgraph></TextBlock>
    <References linked="yes">
      <Reference refNo="1">
        <RefAuthor>Laabs BH</RefAuthor>
        <RefAuthor>Kronziel LL</RefAuthor>
        <RefAuthor>K&#246;nig IR</RefAuthor>
        <RefAuthor>Szymczak S</RefAuthor>
        <RefTitle>Construction of Artificial Most Representative Trees by Minimizing Tree-Based Distance Measures. In: Longo L, Lapuschkin S, Seifert C, editors</RefTitle>
        <RefYear>Cham</RefYear>
        <RefJournal>Explainable Artificial Intelligence</RefJournal>
        <RefPage>Springer Nature Switzerland; 2024. p. 290&#8211;310</RefPage>
        <RefTotal>Laabs BH, Kronziel LL, K&#246;nig IR, Szymczak S. Construction of Artificial Most Representative Trees by Minimizing Tree-Based Distance Measures. In: Longo L, Lapuschkin S, Seifert C, editors. Explainable Artificial Intelligence. Cham: Springer Nature Switzerland; 2024. p. 290&#8211;310. (Communications in Computer and Information Science; 2154). DOI: 10.1007&#47;978-3-031-63797-1&#95;15</RefTotal>
        <RefLink>http:&#47;&#47;dx.doi.org&#47;10.1007&#47;978-3-031-63797-1&#95;15</RefLink>
      </Reference>
      <Reference refNo="2">
        <RefAuthor>Menze BH</RefAuthor>
        <RefAuthor>Kelm BM</RefAuthor>
        <RefAuthor>Splitthoff DN</RefAuthor>
        <RefAuthor>Koethe U</RefAuthor>
        <RefAuthor>Hamprecht FA</RefAuthor>
        <RefTitle>On Oblique Random Forests. In: Gunopulos D, Hofmann T, Malerba D, Vazirgiannis M, editors</RefTitle>
        <RefYear>Berlin,</RefYear>
        <RefJournal>Machine Learning and Knowledge Discovery in Databases</RefJournal>
        <RefPage>Springer Berlin Heidelberg; 2011 p. 453&#8211;69</RefPage>
        <RefTotal>Menze BH, Kelm BM, Splitthoff DN, Koethe U, Hamprecht FA. On Oblique Random Forests. In: Gunopulos D, Hofmann T, Malerba D, Vazirgiannis M, editors. Machine Learning and Knowledge Discovery in Databases. Berlin, Heidelberg: Springer Berlin Heidelberg; 2011 p. 453&#8211;69. (Lecture Notes in Computer Science; 6912). DOI: 10.1007&#47;978-3-642-23783-6&#95;29</RefTotal>
        <RefLink>http:&#47;&#47;dx.doi.org&#47;10.1007&#47;978-3-642-23783-6&#95;29</RefLink>
      </Reference>
      <Reference refNo="3">
        <RefAuthor>Banerjee M</RefAuthor>
        <RefAuthor>Ding Y</RefAuthor>
        <RefAuthor>Noone AM</RefAuthor>
        <RefTitle>Identifying representative trees from ensembles</RefTitle>
        <RefYear>2012</RefYear>
        <RefJournal>Statistics in Medicine</RefJournal>
        <RefPage>1601&#8211;16</RefPage>
        <RefTotal>Banerjee M, Ding Y, Noone AM. Identifying representative trees from ensembles. Statistics in Medicine. 2012 Jul 10;31(15):1601&#8211;16.</RefTotal>
      </Reference>
      <Reference refNo="4">
        <RefAuthor>Laabs BH</RefAuthor>
        <RefAuthor>Westenberger A</RefAuthor>
        <RefAuthor>K&#246;nig IR</RefAuthor>
        <RefTitle>Identification of representative trees in random forests based on a new tree-based distance measure</RefTitle>
        <RefYear>2023</RefYear>
        <RefJournal>Adv Data Anal Classif</RefJournal>
        <RefPage></RefPage>
        <RefTotal>Laabs BH, Westenberger A, K&#246;nig IR. Identification of representative trees in random forests based on a new tree-based distance measure. Adv Data Anal Classif. 2023. DOI: 10.1007&#47;s11634-023-00537-7</RefTotal>
        <RefLink>https:&#47;&#47;doi.org&#47;10.1007&#47;s11634-023-00537-7</RefLink>
      </Reference>
      <Reference refNo="5">
        <RefAuthor>Szepannek G</RefAuthor>
        <RefAuthor>von Holt BH</RefAuthor>
        <RefTitle>Can&#8217;t see the forest for the trees: Analyzing groves to explain random forests</RefTitle>
        <RefYear>2023</RefYear>
        <RefJournal>Behaviormetrika</RefJournal>
        <RefPage></RefPage>
        <RefTotal>Szepannek G, von Holt BH. Can&#8217;t see the forest for the trees: Analyzing groves to explain random forests. Behaviormetrika. 2023. DOI: 10.1007&#47;s41237-023-00205-2</RefTotal>
        <RefLink>http:&#47;&#47;dx.doi.org&#47;10.1007&#47;s41237-023-00205-2</RefLink>
      </Reference>
      <Reference refNo="6">
        <RefAuthor>Sies A</RefAuthor>
        <RefAuthor>Van Mechelen I</RefAuthor>
        <RefTitle>C443: a Methodology to See a Forest for the Trees</RefTitle>
        <RefYear>2020</RefYear>
        <RefJournal>J Classif</RefJournal>
        <RefPage>730-753</RefPage>
        <RefTotal>Sies A, Van Mechelen I. C443: a Methodology to See a Forest for the Trees. J Classif. 2020;37(3):730-753. DOI: 10.1007&#47;s00357-019-09350-4</RefTotal>
        <RefLink>http:&#47;&#47;dx.doi.org&#47;10.1007&#47;s00357-019-09350-4</RefLink>
      </Reference>
      <Reference refNo="7">
        <RefAuthor>Li H</RefAuthor>
        <RefAuthor>Xu J</RefAuthor>
        <RefAuthor>Armstrong WW</RefAuthor>
        <RefTitle>LHT: Statistically-Driven Oblique Decision Trees for Interpretable Classification &#91;Preprint&#93;</RefTitle>
        <RefYear>2025</RefYear>
        <RefJournal>arXiv</RefJournal>
        <RefPage></RefPage>
        <RefTotal>Li H, Xu J, Armstrong WW. LHT: Statistically-Driven Oblique Decision Trees for Interpretable Classification &#91;Preprint&#93;. arXiv. 2025. DOI: 10.48550&#47;ARXIV.2505.04139</RefTotal>
        <RefLink>http:&#47;&#47;dx.doi.org&#47;10.48550&#47;ARXIV.2505.04139</RefLink>
      </Reference>
    </References>
    <Media>
      <Tables>
        <NoOfTables>0</NoOfTables>
      </Tables>
      <Figures>
        <NoOfPictures>0</NoOfPictures>
      </Figures>
      <InlineFigures>
        <NoOfPictures>0</NoOfPictures>
      </InlineFigures>
      <Attachments>
        <NoOfAttachments>0</NoOfAttachments>
      </Attachments>
    </Media>
  </OrigData>
</GmsArticle>