Feature Selection for Clustering

Authors:
Manoranjan Dash;Huan Liu
Affiliations:
-;-
Venue:
PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
Year:
2000

Citing 7
Cited 18

Algorithms for clustering data

Algorithms for clustering data
Wrappers for performance enhancement and oblivious decision graphs

Wrappers for performance enhancement and oblivious decision graphs
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases

Mining Data from a Knowledge Management Perspective: An Application to Outcome Prediction in Patients with Resectable Hepatocellular Carcinoma

AIME '01 Proceedings of the 8th Conference on AI in Medicine in Europe: Artificial Intelligence Medicine
A machine learning perspective on the development of clinical decision support systems utilizing mass spectra of blood samples

Journal of Biomedical Informatics
Categorization and analysis of text in computer mediated communication archives using visualization

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Scalable Feature Selection for Multi-class Problems

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Feature Selection Using Non Linear Feature Relation Index

PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
Spectral clustering with eigenvector selection based on entropy ranking

Neurocomputing
Optimizing reservoir features in oil exploration management based on fusion of soft computing

Applied Soft Computing
A new wrapper feature selection approach using neural network

Neurocomputing
Applying electromagnetism-like mechanism for feature selection

Information Sciences: an International Journal
Measures for unsupervised fuzzy-rough feature selection

International Journal of Hybrid Intelligent Systems - Advances in Intelligent Agent Systems
Visual interactive evolutionary algorithm for high dimensional outlier detection and data clustering problems

International Journal of Bio-Inspired Computation
A filter feature selection method for clustering

ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
An evaluation of filter and wrapper methods for feature selection in categorical clustering

IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
An eye-hand data fusion framework for pervasive sensing of surgical activities

Pattern Recognition
A novel approach for finding alternative clusterings using feature selection

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Unsupervised feature selection in digital mammogram image using rough set theory

International Journal of Bioinformatics Research and Applications
An evolutionary approach for high dimensional attribute selection

International Journal of Intelligent Information and Database Systems
On fuzzy-rough attribute selection: Criteria of Max-Dependency, Max-Relevance, Min-Redundancy, and Max-Significance

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is an important data mining task. Data mining often concerns large and high-dimensional data but unfortunately most of the clustering algorithms in the literature are sensitive to largeness or high-dimensionality or both. Different features affect clusters differently, some are important for clusters while others may hinder the clustering task. An efficient way of handling it is by selecting a subset of important features. It helps in finding clusters efficiently, understanding the data better and reducing data size for efficient storage, collection and processing. The task of finding original important features for unsupervised data is largely untouched. Traditional feature selection algorithms work only for supervised data where class information is available. For unsupervised data, without class information, often principal components (PCs) are used, but PCs still require all features and they may be difficult to understand. Our approach: first features are ranked according to their importance on clustering and then a subset of important features are selected. For large data we use a scalable method using sampling. Empirical evaluation shows the effectiveness and scalability of our approach for benchmark and synthetic data sets.