Differential prioritization in feature selection and classifier aggregation for multiclass microarray datasets

Authors:
Chia Huey Ooi;Madhu Chetty;Shyh Wei Teng
Affiliations:
Gippsland School of Information Technology, Monash University, Churchill, Australia 3842;Gippsland School of Information Technology, Monash University, Churchill, Australia 3842;Gippsland School of Information Technology, Monash University, Churchill, Australia 3842
Venue:
Data Mining and Knowledge Discovery
Year:
2007

Citing 11
Cited 5

Bagging predictors

Machine Learning
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Class prediction and discovery using gene expression data

RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
Machine Learning

Machine Learning
Training Invariant Support Vector Machines

Machine Learning
Minimum Redundancy Feature Selection from Microarray Gene Expression Data

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
An introduction to variable and feature selection

The Journal of Machine Learning Research
A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression

Bioinformatics
The 'subsequent artificial neural network' (SANN) approach might bring more classificatory power to ANN-based DNA microarray analyses

Bioinformatics
Relevance, redundancy and differential prioritization in feature selection for multiclass gene expression data

ISBMDA'05 Proceedings of the 6th International conference on Biological and Medical Data Analysis
A comparison of methods for multiclass support vector machines

IEEE Transactions on Neural Networks

A decision rule-based method for feature selection in predictive data mining

Expert Systems with Applications: An International Journal
SVM-FuzCoC: A novel SVM-based feature selection method using a fuzzy complementary criterion

Pattern Recognition
Using OVA modeling to improve classification performance for large datasets

Expert Systems with Applications: An International Journal
Base Model Combination Algorithm for Resolving Tied Predictions for K-Nearest Neighbor OVA Ensemble Models

INFORMS Journal on Computing
Positive-versus-Negative Classification for Model Aggregation in Predictive Data Mining

INFORMS Journal on Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

The high dimensionality of microarray datasets endows the task of multiclass tissue classification with various difficulties--the main challenge being the selection of features deemed relevant and non-redundant to form the predictor set for classifier training. The necessity of varying the emphases on relevance and redundancy, through the use of the degree of differential prioritization (DDP) during the search for the predictor set is also of no small importance. Furthermore, there are several types of decomposition technique for the feature selection (FS) problem--all-classes-at-once, one-vs.-all (OVA) or pairwise (PW). Also, in multiclass problems, there is the need to consider the type of classifier aggregation used--whether non-aggregated (a single machine), or aggregated (OVA or PW). From here, first we propose a systematic approach to combining the distinct problems of FS and classification. Then, using eight well-known multiclass microarray datasets, we empirically demonstrate the effectiveness of the DDP in various combinations of FS decomposition types and classifier aggregation methods. Aided by the variable DDP, feature selection leads to classification performance which is better than that of rank-based or equal-priorities scoring methods and accuracies higher than previously reported for benchmark datasets with large number of classes. Finally, based on several criteria, we make general recommendations on the optimal choice of the combination of FS decomposition type and classifier aggregation method for multiclass microarray datasets.