Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Finding generalized projected clusters in high dimensional spaces
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Machine Learning
Clustering by pattern similarity in large data sets
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
A Monte Carlo algorithm for fast projective clustering
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Constrained K-means Clustering with Background Knowledge
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Clustering with Instance-level Constraints
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Biclustering of Expression Data
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Integrating Declarative Knowledge in Hierarchical Clustering Tasks
IDA '99 Proceedings of the Third International Symposium on Advances in Intelligent Data Analysis
HARP: A Practical Projected Clustering Algorithm
IEEE Transactions on Knowledge and Data Engineering
Comparing Subspace Clusterings
IEEE Transactions on Knowledge and Data Engineering
An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data
IEEE Transactions on Knowledge and Data Engineering
Enhancing semi-supervised clustering: a feature projection perspective
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A clustering framework based on subjective and objective validity criteria
ACM Transactions on Knowledge Discovery from Data (TKDD)
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
ACM Transactions on Knowledge Discovery from Data (TKDD)
Query result clustering for object-level search
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Subspace and projected clustering: experimental evaluation and analysis
Knowledge and Information Systems
SKM-SNP: SNP markers detection method
Journal of Biomedical Informatics
Advancing data clustering via projective clustering ensembles
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A robust seedless algorithm for correlation clustering
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Projective clustering ensembles
Data Mining and Knowledge Discovery
Fuzzy partition based soft subspace clustering and its applications in high dimensional data
Information Sciences: an International Journal
Hybrid entity clustering using crowds and data
The VLDB Journal — The International Journal on Very Large Data Bases
Semi-supervised projected model-based clustering
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
Recent studies suggest that projected clusters with extremely low dimensionality exist in many real datasets. A number of projected clustering algorithms have been proposed in the past several years, but few can identify clusters with dimensionality lower than 10% of the total number of dimensions, which are commonly found in some real datasets such as gene expression profiles. In this paper we propose a new algorithm that can accurately identify projected clusters with relevant dimensions as few as 5% of the total number of dimensions. It makes use of a robust objective function that combines object clustering and dimension selection into a single optimization problem. The algorithm can also utilize domain knowledge in the form of labeled objects and labeled dimensions to improve its clustering accuracy. We believe this is the first semi-supervised projected clustering algorithm. Both theoretical analysis and experimental results show that by using a small amount of input knowledge, possibly covering only a portion of the underlying classes, the new algorithm can be further improved to accurately detect clusters with only 1% of the dimensions being relevant. The algorithm is also useful in getting a target set of clusters when there are multiple possible groupings of the objects.