PanForest: predicting genes in genomes using random forests

Beavan, AJS, Domingo-Sananes, MR ORCID logoORCID: https://orcid.org/0000-0002-3339-8671 and McInerney, JO, 2026. PanForest: predicting genes in genomes using random forests. Bioinformatics. ISSN 1367-4803

Full text not available from this repository.

Abstract

Motivation: The presence or absence of some genes in a genome can influence whether other genes are likely to be present or absent. Understanding these gene co-occurrence and avoidance patterns reveals fundamental principles of genome organisation, with applications ranging from evolutionary reconstruction to rational design of synthetic genomes.

Implementations: PanForest, presented here, uses random forest classifiers to predict the presence and absence of genes in genomes from the set of other genes present. Performance statistics output by PanForest reveal how predictable each gene’s presence or absence is, based on the presence or absence of other genes in the genome. Further, PanForest produces statistics indicating the importance of each gene in predicting the presence or absence of each other gene. The PanForest software can run serially or in parallel, thereby facilitating the analysis of pangenomes at Network of Life scale.

Results: A pangenome of 12,741 accessory genes in 1,000 Escherichia coli genomes was analysed in around 5 hours using 8 processors. To demonstrate PanForest’s utility, we present a case study and show that certain genes associated with resistance to antimicrobial drugs reliably predict the presence or absence of other genes associated with resistance to the same drug. Further, we highlight several associations between those genes and others not known to be associated with antimicrobial resistance (AMR), or associated with resistance to other drugs. We envisage PanForest’s use in studies from multiple disciplines concerning the dynamics of gene distributions in pangenomes ranging from biomedical science and synthetic biology to molecular ecology

Item Type: Journal article
Publication Title: Bioinformatics
Creators: Beavan, A.J.S., Domingo-Sananes, M.R. and McInerney, J.O.
Publisher: Oxford University Press
Date: 9 January 2026
ISSN: 1367-4803
Identifiers:
Number
Type
10.1093/bioinformatics/btag005
DOI
2558408
Other
Rights: This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Divisions: Schools > School of Science and Technology
Record created by: Jonathan Gallacher
Date Added: 20 Jan 2026 13:54
Last Modified: 20 Jan 2026 13:54
URI: https://irep.ntu.ac.uk/id/eprint/55083

Actions (login required)

Edit View Edit View

Statistics

Views

Views per month over past year

Downloads

Downloads per month over past year