Identifying a small set of marker genes using minimum expected cost of misclassification

Authors:
Samuel H. Huang;Dengyao Mo;Jarek Meller;Michael Wagner
Affiliations:
School of Dynamic Systems, University of Cincinnati, 2600 Clifton Ave., Cincinnati, OH 45221, United States;School of Dynamic Systems, University of Cincinnati, 2600 Clifton Ave., Cincinnati, OH 45221, United States;Department of Environmental Health, University of Cincinnati, 2600 Clifton Ave., Cincinnati, OH 45267, United States;Division of Biomedical Informatics, Cincinnati Children's Hospital, 3333 Burnet Ave., Cincinnati, OH 45229, United States
Venue:
Artificial Intelligence in Medicine
Year:
2012

Citing 18
Cited 0

A practical approach to feature selection

ML92 Proceedings of the ninth international workshop on Machine learning
Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Feature Selection: Evaluation, Application, and Small Sample Performance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Feature Selection for Knowledge Discovery and Data Mining

Feature Selection for Knowledge Discovery and Data Mining
Minimum Redundancy Feature Selection from Microarray Gene Expression Data

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
An introduction to variable and feature selection

The Journal of Machine Learning Research
Redundancy based feature selection for microarray data

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
BNTagger

Bioinformatics
Data Mining: A Knowledge Discovery Approach

Data Mining: A Knowledge Discovery Approach
Scalable training of L1-regularized log-linear models

Proceedings of the 24th international conference on Machine learning
A Branch and Bound Algorithm for Feature Subset Selection

IEEE Transactions on Computers
SPINE

Bioinformatics
A review of feature selection techniques in bioinformatics

Bioinformatics
A novel feature selection approach for biomedical data classification

Journal of Biomedical Informatics
A Multiple-Filter-Multiple-Wrapper Approach to Gene Selection and Microarray Data Classification

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Multi-marker tagging single nucleotide polymorphism selection using estimation of distribution algorithms

Artificial Intelligence in Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Objectives: This paper presents a model independent feature selection approach to identify a small subset of marker genes. Methods and material: An evaluation measure, minimum expected cost of misclassification (MEMC), is used to estimate the discriminative power of a feature subset without building a model. The MECM measure is combined with sequential forward search for feature selection. This approach was applied to a breast cancer profiling problem, with the goal of identifying a small number of marker genes whose expression can be used to predict cancer molecular subtype (p53 gene status). Furthermore, the method was also applied to find a small set of single-nucleotide polymorphisms (SNPs) that can be used to predict molecular phenotype of a different type, namely alleles (genetic variants) of human leukocyte antigen genes that play an important roles in autoimmunity. Results: Two marker genes were identified based on p53 status, which achieved a p-value of 7.53x10^-^5 (vs. 6x10^-^4 with 32 genes identified by previous research) in survival analysis. Six SNP loci were identified that achieved a leave-one-out cross-validation accuracy of 92.8% (vs. 90.6% and 89.5% with 18 SNPs selected using @g^2 statistics and information gain, respectively). Conclusion: The MECM-based feature selection approach is capable of identifying a smaller subset of market genes with comparable or even better performance than that obtained using conventional filter methods.