Algorithms for clustering data
Algorithms for clustering data
Wrappers for performance enhancement and oblivious decision graphs
Wrappers for performance enhancement and oblivious decision graphs
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
CACTUS—clustering categorical data using summaries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
AIME '01 Proceedings of the 8th Conference on AI in Medicine in Europe: Artificial Intelligence Medicine
Journal of Biomedical Informatics
Categorization and analysis of text in computer mediated communication archives using visualization
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Scalable Feature Selection for Multi-class Problems
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Feature Selection Using Non Linear Feature Relation Index
PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
Applying electromagnetism-like mechanism for feature selection
Information Sciences: an International Journal
Measures for unsupervised fuzzy-rough feature selection
International Journal of Hybrid Intelligent Systems - Advances in Intelligent Agent Systems
International Journal of Bio-Inspired Computation
A filter feature selection method for clustering
ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
An evaluation of filter and wrapper methods for feature selection in categorical clustering
IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
An eye-hand data fusion framework for pervasive sensing of surgical activities
Pattern Recognition
A novel approach for finding alternative clusterings using feature selection
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Unsupervised feature selection in digital mammogram image using rough set theory
International Journal of Bioinformatics Research and Applications
An evolutionary approach for high dimensional attribute selection
International Journal of Intelligent Information and Database Systems
Hi-index | 0.00 |
Clustering is an important data mining task. Data mining often concerns large and high-dimensional data but unfortunately most of the clustering algorithms in the literature are sensitive to largeness or high-dimensionality or both. Different features affect clusters differently, some are important for clusters while others may hinder the clustering task. An efficient way of handling it is by selecting a subset of important features. It helps in finding clusters efficiently, understanding the data better and reducing data size for efficient storage, collection and processing. The task of finding original important features for unsupervised data is largely untouched. Traditional feature selection algorithms work only for supervised data where class information is available. For unsupervised data, without class information, often principal components (PCs) are used, but PCs still require all features and they may be difficult to understand. Our approach: first features are ranked according to their importance on clustering and then a subset of important features are selected. For large data we use a scalable method using sampling. Empirical evaluation shows the effectiveness and scalability of our approach for benchmark and synthetic data sets.