Evolutionary computation-based feature selection for finding a stable set of features in high-dimensional data

Salesi Mousaabadi, S., 2019. Evolutionary computation-based feature selection for finding a stable set of features in high-dimensional data. PhD, Nottingham Trent University.

[img]
Preview
Text
Sadegh Salesi Mousaabadi 2020.2.pdf - Published version

Download (7MB) | Preview

Abstract

Evolutionary Computation (EC) algorithms have proved to work well for feature selection because they are powerful search techniques and can produce multiple good solutions. However, they suffer from some limitations for real world applications. Firstly, ECs require high computation time as they evaluate many solutions at each iteration. Secondly, a classifier is usually used as their fitness function which causes the selected subset to perform well only on the utilised classifier (e.g. classifier-bias). Lastly, ECs, as stochastic search methods, return a different final subset in different runs which poses a problem for finding a stable set of features (e.g. stability issue). To address computation time and classifier-bias limitations, this thesis proposes a new two-stage selection approach called filter/filter in which two filter feature selection algorithms are combined. In the first stage, a ranking algorithm forms a reduced dataset by selecting the most informative features from the original dataset. In the second stage, the reduced dataset is fed to a novel EC algorithm to select final feature subset. This new EC algorithm is a Tabu search hybridised with an Asexual Genetic Algorithm called TAGA. TAGA benefits from new search components and solution representation which can effectively reduce computation time. To select a classifier-unbiased final subset, a statistical criterion is used as the fitness function which evaluates the subset independent of any classifier. Experiments show that the proposed filter/filter requires an acceptable computation time and selects more classifier-unbiased features compared to the state-of-the-arts. To find a stable set of features, a novel Generalisation Power Index (GPI) is proposed to analyse the generalisation power of final subsets of an EC in several runs. Generalisation power refers to performance capability of a subset over wide range of classifiers. Computation results confirm that GPI is able to find a stable set of features which achieves near optimal accuracy when used to train various classifiers. To ex amine the suitability of the proposed methods for real-world applications, the filter/filter approach and GPI are integrated to select a stable set of features for METABRIC breast cancer subtype classification problem. Experimental results show that this integration not only can address the limitations of ECs for a real-world biomedical feature selection problem but it performs better than alternatives methods.

Item Type: Thesis
Creators: Salesi Mousaabadi, S.
Date: September 2019
Divisions: Schools > School of Science and Technology
Depositing User: Linda Sullivan
Date Added: 27 May 2020 13:26
Last Modified: 29 May 2020 07:57
URI: http://irep.ntu.ac.uk/id/eprint/39901

Actions (login required)

Edit View Edit View

Views

Views per month over past year

Downloads

Downloads per month over past year