Pitfalls of bacterial pan-genome analysis approaches: a case study of Mycobacterium tuberculosis and two less clonal bacterial species

Marin, MG, Quinones-Olvera, N, Wippel, C, Behruznia, M, Jeffrey, BM, Harris, M, Mann, BC, Rosenthal, A, Jacobson, KR, Warren, RM, Li, H, Meehan, CJ ORCID logoORCID: https://orcid.org/0000-0003-0724-8343 and Farhat, MR, 2025. Pitfalls of bacterial pan-genome analysis approaches: a case study of Mycobacterium tuberculosis and two less clonal bacterial species. Bioinformatics, 41 (5): btaf219. ISSN 1367-4803

[thumbnail of 2487826_Meehan.pdf]
Preview
Text
2487826_Meehan.pdf - Published version

Download (4MB) | Preview

Abstract

Pan-genome analysis is a fundamental tool for studying bacterial genome evolution; however, the variety in methods used to define and measure the pan-genome poses challenges to the interpretation and reliability of results. Using Mycobacterium tuberculosis, a clonally evolving bacterium with a small accessory genome, as a model system, we systematically evaluated sources of variability in pan-genome estimates. Our analysis revealed that differences in assembly type (short-read versus hybrid), annotation pipeline, and pan-genome software, significantly impact predictions of core and accessory genome size. Extending our analysis to two additional bacterial species, Escherichia coli and Staphylococcus aureus, we observed consistent tool-dependent biases but species-specific patterns in pan-genome variability. Our findings highlight the importance of integrating nucleotide- and protein-level analyses to improve the reliability and reproducibility of pan-genome studies across diverse bacterial populations.

Item Type: Journal article
Publication Title: Bioinformatics
Creators: Marin, M.G., Quinones-Olvera, N., Wippel, C., Behruznia, M., Jeffrey, B.M., Harris, M., Mann, B.C., Rosenthal, A., Jacobson, K.R., Warren, R.M., Li, H., Meehan, C.J. and Farhat, M.R.
Publisher: Oxford University Press (OUP)
Date: 8 May 2025
Volume: 41
Number: 5
ISSN: 1367-4803
Identifiers:
Number
Type
10.1093/bioinformatics/btaf219
DOI
2487826
Other
Rights: © The Author(s) 2025. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Divisions: Schools > School of Science and Technology
Record created by: Jeremy Silvester
Date Added: 22 Aug 2025 07:57
Last Modified: 22 Aug 2025 07:57
URI: https://irep.ntu.ac.uk/id/eprint/54241

Actions (login required)

Edit View Edit View

Statistics

Views

Views per month over past year

Downloads

Downloads per month over past year