Distance based feature selection for clustering microarray data

Authors:
Manoranjan Dash;Vivekanand Gopalkrishnan
Affiliations:
Nanyang Technological University, Singapore;Nanyang Technological University, Singapore
Venue:
DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Year:
2008

Citing 12
Cited 2

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Journal of Computational and Applied Mathematics
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Visualization and interactive feature selection for unsupervised data

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Feature selection in unsupervised learning via evolutionary search

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Knowledge Acquisition Via Incremental Conceptual Clustering

Machine Learning
Efficient Feature Selection in Conceptual Clustering

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Model Selection in Unsupervised Learning with Applications To Document Clustering

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Redundancy based feature selection for microarray data

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A dynamically growing self-organizing tree (DGSOT) for hierarchical clustering gene expression profiles

Bioinformatics
Two way focused classification

DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery

Online feature selection for mining big data

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Spatial distance join based feature selection

Engineering Applications of Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In microarray data, clustering is the fundamental task for separating genes into biologically functional groups or for classifying tissues and phenotypes. Recently, with innovative gene expression microarray data technologies, thousands of expression levels of genes (features) can be measured simultaneously in a single experiment. The large number of genes with a lot of noise causes high complexity for cluster analysis. This challenge has raised the demand for feature selection - an effective dimensionality reduction technique that removes noisy features. In this paper we propose a novel filter method for feature selection. The suggested method, called ClosestFS, is based on a distance measure. For each feature, the distance is evaluated by computing its impact on the histogram for the whole data. Our experimental results show that the quality of clustering results (evaluated by several widely used measures) of K-means algorithm using ClosestFS as the pre-processing step is significantly better than that of the pure K-means.