Redundancy based feature selection for microarray data

Authors:
Lei Yu;Huan Liu
Affiliations:
Arizona State University, Tempe, AZ;Arizona State University, Tempe, AZ
Venue:
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2004

Citing 13
Cited 49

C4.5: programs for machine learning

C4.5: programs for machine learning
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Multiple Comparisons in Induction Algorithms

Machine Learning
Feature Selection for Knowledge Discovery and Data Mining

Feature Selection for Knowledge Discovery and Data Mining
Discretization: An Enabling Technique

Data Mining and Knowledge Discovery
Feature selection for high-dimensional genomic microarray data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Minimum Redundancy Feature Selection from Microarray Gene Expression Data

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Theoretical and Empirical Analysis of ReliefF and RReliefF

Machine Learning
Interactive exploration of coherent patterns in time-series gene expression data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining phenotypes and informative genes from gene expression data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

Efficient Feature Selection via Analysis of Relevance and Redundancy

The Journal of Machine Learning Research
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering
Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Integer programming models and algorithms for molecular classification of cancer from microarray data

ACSC '05 Proceedings of the Twenty-eighth Australasian conference on Computer Science - Volume 38
Evolving Feature Selection

IEEE Intelligent Systems
Incremental wrapper-based gene selection from microarray data for cancer classification

Pattern Recognition
Analysis of breast feeding data using data mining methods

AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Markov blanket-embedded genetic algorithm for gene selection

Pattern Recognition
Logic classification and feature selection for biomedical data

Computers & Mathematics with Applications
Consensus unsupervised feature ranking from multiple views

Pattern Recognition Letters
Informative Gene Set Selection Via Distance Sensitive Rival Penalized Competitive Learning and Redundancy Analysis

ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Advances in Neural Networks
Hopfield Networks in Relevance and Redundancy Feature Selection Applied to Classification of Biomedical High-Resolution Micro-CT Images

ICDM '08 Proceedings of the 8th industrial conference on Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects
Identifying video spammers in online social networks

AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
A Study on the Importance of Differential Prioritization in Feature Selection Using Toy Datasets

PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
Feature cluster selection for high-throughput data analysis

International Journal of Data Mining and Bioinformatics
Model cortical responses for the detection of perceptual onsets and beat tracking in singing

Connection Science - Music, Brain, Cognition
Semantic similarity based feature extraction from microarray expression data

International Journal of Data Mining and Bioinformatics
Feature selection via Boolean independent component analysis

Information Sciences: an International Journal
A novel ensemble machine learning for robust microarray data classification

Computers in Biology and Medicine
Feature Selection for Gene Expression Using Model-Based Entropy

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A Multiple-Filter-Multiple-Wrapper Approach to Gene Selection and Microarray Data Classification

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Discovering prediction model for environmental distribution maps

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
A novel metric for redundant gene elimination based on discriminative contribution

ISBRA'08 Proceedings of the 4th international conference on Bioinformatics research and applications
Distance based feature selection for clustering microarray data

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
On the effectiveness of gene selection for microarray classification methods

ACIIDS'10 Proceedings of the Second international conference on Intelligent information and database systems: Part II
A hybrid model to favor the selection of high quality features in high dimensional domains

IDEAL'11 Proceedings of the 12th international conference on Intelligent data engineering and automated learning
Multiobjective optimization of indexes obtained by clustering for feature selection methods evaluation in genes expression microarrays

IDEAL'11 Proceedings of the 12th international conference on Intelligent data engineering and automated learning
Gene selection for classifying microarray data using grey relation analysis

DS'06 Proceedings of the 9th international conference on Discovery Science
Combined gene selection methods for microarray data analysis

KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I
Automatic feature selection for classification of health data

AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
Gene ranking from microarray data for cancer classification: a machine learning approach

KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
Robust ensemble learning for cancer diagnosis based on microarray data classification

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
Analysis of feature rankings for classification

IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
Virtual gene: a gene selection algorithm for sample classification on microarray datasets

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Integrating heterogeneous microarray data sources using correlation signatures

DILS'05 Proceedings of the Second international conference on Data Integration in the Life Sciences
OVA scheme vs. single machine approach in feature selection for microarray datasets

ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
A new maximum-relevance criterion for significant gene selection

PRIB'06 Proceedings of the 2006 international conference on Pattern Recognition in Bioinformatics
Virtual gene: using correlations between genes to select informative genes on microarray datasets

Transactions on Computational Systems Biology II
Identifying a small set of marker genes using minimum expected cost of misclassification

Artificial Intelligence in Medicine
Boost feature subset selection: a new gene selection algorithm for microarray dataset

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
A novel feature selection method based on normalized mutual information

Applied Intelligence
Selecting feature subset for high dimensional data via the propositional FOIL rules

Pattern Recognition
Two way focused classification

DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
Post-processing strategies for improving local gene expression pattern analysis

International Journal of Data Mining and Bioinformatics
Toward the scalability of neural networks through feature selection

Expert Systems with Applications: An International Journal
A new gene selection method for microarray data based on PSO and informativeness metric

ICIC'13 Proceedings of the 9th international conference on Intelligent Computing Theories and Technology
Spatial distance join based feature selection

Engineering Applications of Artificial Intelligence
Hybridising harmony search with a Markov blanket for gene selection problems

Information Sciences: an International Journal
A novel feature subset selection algorithm based on association rule mining

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

In gene expression microarray data analysis, selecting a small number of discriminative genes from thousands of genes is an important problem for accurate classification of diseases or phenotypes. The problem becomes particularly challenging due to the large number of features (genes) and small sample size. Traditional gene selection methods often select the top-ranked genes according to their individual discriminative power without handling the high degree of redundancy among the genes. Latest research shows that removing redundant genes among selected ones can achieve a better representation of the characteristics of the targeted phenotypes and lead to improved classification accuracy. Hence, we study in this paper the relationship between feature relevance and redundancy and propose an efficient method that can effectively remove redundant genes. The efficiency and effectiveness of our method in comparison with representative methods has been demonstrated through an empirical study using public microarray data sets.