CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Clustering gene expression patterns
RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
Fast algorithms for projected clustering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Fast hierarchical clustering and other applications of dynamic closest pairs
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Finding generalized projected clusters in high dimensional spaces
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques
Data mining: concepts and techniques
Clustering Algorithms
Clustering by pattern similarity in large data sets
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
A Monte Carlo algorithm for fast projective clustering
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Biclustering of Expression Data
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
ROCK: A Robust Clustering Algorithm for Categorical Attributes
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
MaPle: A Fast Algorithm for Maximal Pattern-based Clustering
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
On Discovery of Extremely Low-Dimensional Clusters Using Semi-Supervised Projected Clustering
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Comparing Subspace Clusterings
IEEE Transactions on Knowledge and Data Engineering
An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data
IEEE Transactions on Knowledge and Data Engineering
Strategies for Identifying Statistically Significant Dense Regions in Microarray Data
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
APPLYING DATA MINING TECHNIQUES FOR CANCER CLASSIFICATION ON GENE EXPRESSION DATA
Cybernetics and Systems
An Unsupervised Approach to Cluster Web Search Results Based on Word Sense Communities
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
ACM Transactions on Knowledge Discovery from Data (TKDD)
Query result clustering for object-level search
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A semi-supervised approach to projected clustering with applications to microarray data
International Journal of Data Mining and Bioinformatics
Subspace and projected clustering: experimental evaluation and analysis
Knowledge and Information Systems
SKM-SNP: SNP markers detection method
Journal of Biomedical Informatics
A fast algorithm for finding correlation clusters in noise data
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Mining quality-aware subspace clusters
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Comparative analysis of biclustering algorithms
Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
A robust seedless algorithm for correlation clustering
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Partitive clustering (K-means family)
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Projective clustering ensembles
Data Mining and Knowledge Discovery
Fuzzy partition based soft subspace clustering and its applications in high dimensional data
Information Sciences: an International Journal
Hybrid entity clustering using crowds and data
The VLDB Journal — The International Journal on Very Large Data Bases
Semi-supervised projected model-based clustering
Data Mining and Knowledge Discovery
Hi-index | 0.01 |
In high-dimensional data, clusters can exist in subspaces that hide themselves from traditional clustering methods. A number of algorithms have been proposed to identify such projected clusters, but most of them rely on some user parameters to guide the clustering process. The clustering accuracy can be seriously degraded if incorrect values are used. Unfortunately, in real situations, it is rarely possible for users to supply the parameter values accurately, which causes practical difficulties in applying these algorithms to real data. In this paper, we analyze the major challenges of projected clustering and suggest why these algorithms need to depend heavily on user parameters. Based on the analysis, we propose a new algorithm that exploits the clustering status to adjust the internal thresholds dynamically without the assistance of user parameters. According to the results of extensive experiments on real and synthetic data, the new method has excellent accuracy and usability. It outperformed the other algorithms even when correct parameter values were artificially supplied to them. The encouraging results suggest that projected clustering can be a practical tool for various kinds of real applications.