Silhouettes: a graphical aid to the interpretation and validation of cluster analysis
Journal of Computational and Applied Mathematics
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Visualization and interactive feature selection for unsupervised data
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Feature selection in unsupervised learning via evolutionary search
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Knowledge Acquisition Via Incremental Conceptual Clustering
Machine Learning
Efficient Feature Selection in Conceptual Clustering
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Model Selection in Unsupervised Learning with Applications To Document Clustering
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Redundancy based feature selection for microarray data
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Two way focused classification
DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
Online feature selection for mining big data
Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Spatial distance join based feature selection
Engineering Applications of Artificial Intelligence
Hi-index | 0.00 |
In microarray data, clustering is the fundamental task for separating genes into biologically functional groups or for classifying tissues and phenotypes. Recently, with innovative gene expression microarray data technologies, thousands of expression levels of genes (features) can be measured simultaneously in a single experiment. The large number of genes with a lot of noise causes high complexity for cluster analysis. This challenge has raised the demand for feature selection - an effective dimensionality reduction technique that removes noisy features. In this paper we propose a novel filter method for feature selection. The suggested method, called ClosestFS, is based on a distance measure. For each feature, the distance is evaluated by computing its impact on the histogram for the whole data. Our experimental results show that the quality of clustering results (evaluated by several widely used measures) of K-means algorithm using ClosestFS as the pre-processing step is significantly better than that of the pure K-means.