Feature Selection for Clustering - A Filter Solution

Authors:
Manoranjan Dash;Kiseok Choi;Peter Scheuermann;Huan Liu
Affiliations:
-;-;-;-
Venue:
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Year:
2002

Citing 0
Cited 60

Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Efficient Feature Selection via Analysis of Relevance and Redundancy

The Journal of Machine Learning Research
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering
A Generic Framework for Efficient Subspace Clustering of High-Dimensional Data

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Evolving Feature Selection

IEEE Intelligent Systems
Feature selection in predicting the activity of cyclooxygenase-2 inhibitors

AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications
Learning word senses with feature selection and order identification capabilities

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Localized feature selection for clustering

Pattern Recognition Letters
Consensus unsupervised feature ranking from multiple views

Pattern Recognition Letters
Hierarchical fuzzy filter method for unsupervised feature selection

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology
Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm

Computational Statistics & Data Analysis
Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm

Pattern Recognition
Incremental clustering of mixed data based on distance hierarchy

Expert Systems with Applications: An International Journal
A new feature selection method for Gaussian mixture clustering

Pattern Recognition
Feature Selection for Clustering on High Dimensional Data

PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Computational accounting in determining Chart of Accounts using nominal data analysis and concept of entropy

Expert Systems with Applications: An International Journal
Feature Selection for Local Learning Based Clustering

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Using support vector machine with a hybrid feature selection method to the stock trend prediction

Expert Systems with Applications: An International Journal
An Iterative Hybrid Filter-Wrapper Approach to Feature Selection for Document Clustering

Canadian AI '09 Proceedings of the 22nd Canadian Conference on Artificial Intelligence: Advances in Artificial Intelligence
Feature subset selection in large dimensionality domains

Pattern Recognition
A Delphi-based rough sets fusion model for extracting payment rules of vehicle license tax in the government sector

Expert Systems with Applications: An International Journal
A maximum weighted likelihood approach to simultaneous model selection and feature weighting in Gaussian mixture

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Conditional mutual information based feature selection for classification task

CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
Spectral clustering with eigenvector selection based on entropy ranking

Neurocomputing
An efficient feature selection approach for clustering: using a Gaussian mixture model of data dissimilarity

ICCSA'07 Proceedings of the 2007 international conference on Computational science and its applications - Volume Part I
Optimizing reservoir features in oil exploration management based on fusion of soft computing

Applied Soft Computing
A graph based framework for clustering and characterization of SOM

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part III
Nearest-neighbor guided evaluation of data reliability and its applications

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Scaling up feature selection by means of democratization

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part II
Adapt the mRMR criterion for unsupervised feature selection

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
A hybrid feature selection scheme and self-organizing map model for machine health assessment

Applied Soft Computing
Correntropy based feature selection using binary projection

Pattern Recognition
Simultaneous model selection and feature selection via BYY harmony learning

ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part II
Projected Gustafson-Kessel clustering algorithm and its convergence

Transactions on rough sets XIV
Toward lightweight intrusion detection system through simultaneous intrinsic model identification

ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Flexible-Hybrid sequential floating search in statistical feature selection

SSPR'06/SPR'06 Proceedings of the 2006 joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
Modified adaptive resonance theory network for mixed data based on distance hierarchy

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
Classifying credit ratings for Asian banks using integrating feature selection and the CPDA-based rough sets approach

Knowledge-Based Systems
Fuzzy criteria for feature selection

Fuzzy Sets and Systems
Immune multiobjective optimization algorithm for unsupervised feature selection

EuroGP'06 Proceedings of the 2006 international conference on Applications of Evolutionary Computing
Effective feature preprocessing for time series forecasting

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
The application of adaptive partitioned random search in feature selection problem

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
A filter feature selection method for clustering

ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
An evaluation of filter and wrapper methods for feature selection in categorical clustering

IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
Feature selection via joint embedding learning and sparse regression

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
An Empirical Investigation of Filter Attribute Selection Techniques for High-Speed Network Traffic Flow Classification

Wireless Personal Communications: An International Journal
Neighborhood effective information ratio for hybrid feature subset evaluation and selection

Neurocomputing
Quantitative intrusion intensity assessment for intrusion detection systems

Security and Communication Networks
Fuzzy classifier based feature reduction for better gene selection

DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
Feature selection based on cluster and variability analyses for ordinal multi-class classification problems

Knowledge-Based Systems
Massively parallel feature selection: an approach based on variance preservation

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Assisted descriptor selection based on visual comparative data analysis

EuroVis'11 Proceedings of the 13th Eurographics / IEEE - VGTC conference on Visualization
A survey on enhanced subspace clustering

Data Mining and Knowledge Discovery
A scalable approach to simultaneous evolutionary instance and feature selection

Information Sciences: an International Journal
Comparative assessment of feature selection and classification techniques for visual inspection of pot plant seedlings

Computers and Electronics in Agriculture
Modeling hybrid rough set-based classification procedures to identify hemodialysis adequacy for end-stage renal disease patients

Computers in Biology and Medicine
Facing the classification of binary problems with a hybrid system based on quantum-inspired binary gravitational search algorithm and K-NN method

Engineering Applications of Artificial Intelligence
Automatic feature selection for named entity recognition using genetic algorithm

Proceedings of the Fourth Symposium on Information and Communication Technology
Robust feature selection based on regularized brownboost loss

Knowledge-Based Systems
Feature selection for ordinal text classification

Neural Computation

Quantified Score

Hi-index	0.01

Visualization

Abstract

Processing applications with a large number of dimensionshas been a challenge to the KDD community. Featureselection, an effective dimensionality reduction technique,is an essential pre-processing method to remove noisy features.In the literature there are only a few methods proposedfor feature selection for clustering. And, almost all ofthose methods are wrapper' techniques that require a clusteringalgorithm to evaluate the candidate feature subsets.The wrapper approach is largely unsuitable in real-worldapplications due to its heavy reliance on clustering algorithmsthat require parameters such as number of clusters,and due to lack of suitable clustering criteria to evaluateclustering in different subspaces. In this paper we proposea filter' method that is independent of any clustering algorithm.The proposed method is based on the observationthat data with clusters has very different point-to-point distancehistogram than that of data without clusters. Usingthis we propose an entropy measure that is low if data hasdistinct clusters and high otherwise. The entropy measure issuitable for selecting the most important subset of featuresbecause it is invariant with number of dimensions, and isaffected only by the quality of clustering. Extensive performanceevaluation over synthetic, benchmark, and realdatasets shows its effectiveness.