Feature selection and classification of high dimensional mass spectrometry data: a genetic programming approach

Authors:
Soha Ahmed;Mengjie Zhang;Lifeng Peng
Affiliations:
School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand;School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand;School of Biological Sciences, Victoria University of Wellington, Wellington, New Zealand
Venue:
EvoBIO'13 Proceedings of the 11th European conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics
Year:
2013

Citing 9
Cited 0

Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Standardization and denoising algorithms for mass spectra to classify whole-organism bacterial specimens

Bioinformatics
Genetic Programming for Feature Ranking in Classification Problems

SEAL '08 Proceedings of the 7th International Conference on Simulated Evolution and Learning
Prediction of Cancer Class with Majority Voting Genetic Programming Classifier Using Gene Expression Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
A Field Guide to Genetic Programming

A Field Guide to Genetic Programming
Software review: the ECJ toolkit

Genetic Programming and Evolvable Machines
Data mining techniques for cancer detection using serum proteomic profiling

Artificial Intelligence in Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Biomarker discovery using mass spectrometry (MS) data is very useful in disease detection and drug discovery. The process of biomarker discovery in MS data must start with feature selection as the number of features in MS data is extremely large (e.g. thousands) while the number of samples is comparatively small. In this study, we propose the use of genetic programming (GP) for automatic feature selection and classification of MS data. This GP based approach works by using the features selected by two feature selection metrics, namely information gain (IG) and relief-f (REFS-F) in the terminal set. The feature selection performance of the proposed approach is examined and compared with IG and REFS-F alone on five MS data sets with different numbers of features and instances. Naive Bayes (NB), support vector machines (SVMs) and J48 decision trees (J48) are used in the experiments to evaluate the classification accuracy of the selected features. Meanwhile, GP is also used as a classification method in the experiments and its performance is compared with that of NB, SVMs and J48. The results show that GP as a feature selection method can select a smaller number of features with better classification performance than IG and REFS-F using NB, SVMs and J48. In addition, GP as a classification method also outperforms NB and J48 and achieves comparable or slightly better performance than SVMs on these data sets.