Hybrid genetic algorithm-neural network: Feature extraction for unpreprocessed microarray data

Authors:
Dong Ling Tong;Amanda C. Schierz
Affiliations:
The John van Geest Cancer Research Centre, School of Science and Technology, Nottingham Trent University, Clifton Lane, Nottingham, NG11 8NS, UK;School of Design, Engineering and Computing, Bournemouth University, Poole House, Talbot Campus, Fern Barrow, Poole, Dorset, BH12 5BB, UK
Venue:
Artificial Intelligence in Medicine
Year:
2011

Citing 12
Cited 2

Using a Genetic Algorithm and a Perceptron forFeature Selection and Supervised Class Learningin DNA Microarray Data

Artificial Intelligence Review
LS Bound based gene selection for DNA microarray data

Bioinformatics
Construction of robust prognostic predictors by using projective adaptive resonance theory as a gene filtering method

Bioinformatics
A semiparametric approach for marker gene selection based on gene expression data

Bioinformatics
Multiclass cancer classification and biomarker discovery using GA-based algorithms

Bioinformatics
Biomarker discovery in microarray gene expression data with Gaussian processes

Bioinformatics
Accurate Cancer Classification Using Expressions of Very Few Genes

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Cancer classification using ensemble of neural networks with multiple significant gene subsets

Applied Intelligence
A genetic algorithm-based method for feature subset selection

Soft Computing - A Fusion of Foundations, Methodologies and Applications
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Multiclass microarray data classification using GA/ANN method

PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
Conquering the curse of dimensionality in gene expression cancer diagnosis: tough problem, simple models

AIME'05 Proceedings of the 10th conference on Artificial Intelligence in Medicine

Intelligent decision-making for liver fibrosis stadialization based on tandem feature selection and evolutionary-driven neural network

Expert Systems with Applications: An International Journal
A novel classification model for cotton yarn quality based on trained neural network using genetic algorithm

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Objective: Suitable techniques for microarray analysis have been widely researched, particularly for the study of marker genes expressed to a specific type of cancer. Most of the machine learning methods that have been applied to significant gene selection focus on the classification ability rather than the selection ability of the method. These methods also require the microarray data to be preprocessed before analysis takes place. The objective of this study is to develop a hybrid genetic algorithm-neural network (GANN) model that emphasises feature selection and can operate on unpreprocessed microarray data. Method: The GANN is a hybrid model where the fitness value of the genetic algorithm (GA) is based upon the number of samples correctly labelled by a standard feedforward artificial neural network (ANN). The model is evaluated by using two benchmark microarray datasets with different array platforms and differing number of classes (a 2-class oligonucleotide microarray data for acute leukaemia and a 4-class complementary DNA (cDNA) microarray dataset for SRBCTs (small round blue cell tumours)). The underlying concept of the GANN algorithm is to select highly informative genes by co-evolving both the GA fitness function and the ANN weights at the same time. Results: The novel GANN selected approximately 50% of the same genes as the original studies. This may indicate that these common genes are more biologically significant than other genes in the datasets. The remaining 50% of the significant genes identified were used to build predictive models and for both datasets, the models based on the set of genes extracted by the GANN method produced more accurate results. The results also suggest that the GANN method not only can detect genes that are exclusively associated with a single cancer type but can also explore the genes that are differentially expressed in multiple cancer types. Conclusions: The results show that the GANN model has successfully extracted statistically significant genes from the unpreprocessed microarray data as well as extracting known biologically significant genes. We also show that assessing the biological significance of genes based on classification accuracy may be misleading and though the GANN's set of extra genes prove to be more statistically significant than those selected by other methods, a biological assessment of these genes is highly recommended to confirm their functionality.