A New Unsupervised Feature Ranking Method for Gene Expression Data Based on Consensus Affinity

Authors:
Shaohong Zhang;Hau-San Wong;Ying Shen;Dongqing Xie
Affiliations:
Guangzhou University, Guangzhou;City University of Hong Kong, Hong Kong;City University of Hong Kong, Hong Kong;Guangzhou University, Guangzhou
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2012

Citing 25
Cited 0

Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Models for metasearch

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
An introduction to variable and feature selection

The Journal of Machine Learning Research
Solving cluster ensemble problems by bipartite graph partitioning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Feature Selection for Unsupervised Learning

The Journal of Machine Learning Research
Combining Multiple Clusterings Using Evidence Accumulation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

IEEE Transactions on Pattern Analysis and Machine Intelligence
Clustering Ensembles: Models of Consensus and Weak Partitions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Novel Unsupervised Feature Filtering of Biological Data

Bioinformatics
Feature Selection for Unsupervised and Supervised Inference: The Emergence of Sparsity in a Weight-Based Approach

The Journal of Machine Learning Research
A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment

Pattern Recognition Letters
Spectral feature selection for supervised and unsupervised learning

Proceedings of the 24th international conference on Machine learning
Consensus unsupervised feature ranking from multiple views

Pattern Recognition Letters
A review of feature selection techniques in bioinformatics

Bioinformatics
Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm

Pattern Recognition
Information theoretic measures for clusterings comparison: is a correction for chance necessary?

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Feature Selection for Gene Expression Using Model-Based Entropy

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
ARImp: A Generalized Adjusted Rand Index for Cluster Ensembles

ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Feature Selection for Unsupervised Learning Using Random Cluster Ensembles

ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Robust Feature Selection for Microarray Data Based on Multicriterion Fusion

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Feature selection is widely established as one of the fundamental computational techniques in mining microarray data. Due to the lack of categorized information in practice, unsupervised feature selection is more practically important but correspondingly more difficult. Motivated by the cluster ensemble techniques, which combine multiple clustering solutions into a consensus solution of higher accuracy and stability, recent efforts in unsupervised feature selection proposed to use these consensus solutions as oracles. However, these methods are dependent on both the particular cluster ensemble algorithm used and the knowledge of the true cluster number. These methods will be unsuitable when the true cluster number is not available, which is common in practice. In view of the above problems, a new unsupervised feature ranking method is proposed to evaluate the importance of the features based on consensus affinity. Different from previous works, our method compares the corresponding affinity of each feature between a pair of instances based on the consensus matrix of clustering solutions. As a result, our method alleviates the need to know the true number of clusters and the dependence on particular cluster ensemble approaches as in previous works. Experiments on real gene expression data sets demonstrate significant improvement of the feature ranking results when compared to several state-of-the-art techniques.