Cross-cohort genetic risk prediction for Alzheimer’s disease: a transfer learning approach using GWAS and deep learning models

Ihianle, IK ORCID logoORCID: https://orcid.org/0000-0001-7445-8573, Samarasekara, W, Brookes, K ORCID logoORCID: https://orcid.org/0000-0003-2427-2513 and Machado, P ORCID logoORCID: https://orcid.org/0000-0003-1760-3871, 2025. Cross-cohort genetic risk prediction for Alzheimer’s disease: a transfer learning approach using GWAS and deep learning models. BioData Mining, 18: 89. ISSN 1756-0381

Full text not available from this repository.

Abstract

Alzheimer’s Disease (AD) represents a growing global health challenge, driven by complex genetic factors and diverse risk contributors. Currently, an estimated 55 million people worldwide are affected by dementia, with AD responsible for 60–70% of these cases. This paper explores the application of advanced machine learning approaches to predict AD risk using Genome-Wide Association Studies data from multiple cohorts, with a particular focus on transfer learning and feature selection techniques. We evaluate the performance of Wide and Deep Neural Networks and Multi-Head Attention in assessing their ability to generalise across datasets. As part of this, we explore knowledge distillation as a strategy to enhance model efficiency through improved generalisation performance in smaller architectures by transferring knowledge from high-capacity models to lightweight ones. Furthermore, the performance of these deep learning approaches is compared with tree-based ensembles, including Random Forest and XGBoost. Our experiments evaluate the generalisability, transferability, and efficiency of these models across different transfer learning scenarios. Findings indicate that aggregating multi-cohort training data significantly enhances predictive performance, highlighting the importance of data diversity in improving AD risk assessment. The proposed knowledge distillation approach enables the transfer of knowledge from a complex teacher model to a simpler student model, significantly improving performance. To enhance interpretability, we apply SHAP (SHapley Additive exPlanations) to the student models, revealing cohort-specific differences in SNP importance and highlighting variants in genes such as ABI3BP and SYN3, both of which are linked to immune and synaptic functions in AD. The integration of SHAP enables transparent interpretation of model decisions and supports the identification of transferable genetic markers, reinforcing the clinical relevance of our framework in AD risk prediction.

Item Type: Journal article
Publication Title: BioData Mining
Creators: Ihianle, I.K., Samarasekara, W., Brookes, K. and Machado, P.
Publisher: Springer
Date: 22 December 2025
Volume: 18
ISSN: 1756-0381
Identifiers:
Number
Type
10.1186/s13040-025-00506-0
DOI
2552020
Other
Rights: This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Divisions: Schools > School of Science and Technology
Record created by: Jonathan Gallacher
Date Added: 07 Jan 2026 11:46
Last Modified: 07 Jan 2026 11:49
URI: https://irep.ntu.ac.uk/id/eprint/54952

Actions (login required)

Edit View Edit View

Statistics

Views

Views per month over past year

Downloads

Downloads per month over past year