Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A Monte Carlo algorithm for fast projective clustering
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Bump hunting in high-dimensional data
Statistics and Computing
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Subspace clustering for high dimensional data: a review
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
HARP: A Practical Projected Clustering Algorithm
IEEE Transactions on Knowledge and Data Engineering
SCHISM: A New Approach for Interesting Subspace Mining
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Iterative Projected Clustering by Subspace Mining
IEEE Transactions on Knowledge and Data Engineering
Projective Clustering by Histograms
IEEE Transactions on Knowledge and Data Engineering
On Discovery of Extremely Low-Dimensional Clusters Using Semi-Supervised Projected Clustering
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
A Generic Framework for Efficient Subspace Clustering of High-Dimensional Data
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Spatial scan statistics: approximations and performance study
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
P3C: A Robust Projected Clustering Algorithm
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
DUSC: Dimensionality Unbiased Subspace Clustering
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Detection and visualization of subspace cluster hierarchies
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
ACM Transactions on Knowledge Discovery from Data (TKDD)
Detection of orthogonal concepts in subspaces of high dimensional data
Proceedings of the 18th ACM conference on Information and knowledge management
Subspace and projected clustering: experimental evaluation and analysis
Knowledge and Information Systems
Evaluating clustering in subspace projections of high dimensional data
Proceedings of the VLDB Endowment
Can shared-neighbor distances defeat the curse of dimensionality?
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Clustering very large multi-dimensional datasets with MapReduce
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
INCONCO: interpretable clustering of numerical and categorical objects
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
An extension of the PMML standard to subspace clustering models
Proceedings of the 2011 workshop on Predictive markup language modeling
Scalable density-based subspace clustering
Proceedings of the 20th ACM international conference on Information and knowledge management
External evaluation measures for subspace clustering
Proceedings of the 20th ACM international conference on Information and knowledge management
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
A survey on unsupervised outlier detection in high-dimensional numerical data
Statistical Analysis and Data Mining
A survey on enhanced subspace clustering
Data Mining and Knowledge Discovery
Mining order-preserving submatrices from probabilistic matrices
ACM Transactions on Database Systems (TODS)
Hi-index | 0.00 |
Projected and subspace clustering algorithms search for clusters of points in subsets of attributes. Projected clustering computes several disjoint clusters, plus outliers, so that each cluster exists in its own subset of attributes. Subspace clustering enumerates clusters of points in all subsets of attributes, typically producing many overlapping clusters. One problem of existing approaches is that their objectives are stated in a way that is not independent of the particular algorithm proposed to detect such clusters. A second problem is the definition of cluster density based on user-defined parameters, which makes it hard to assess whether the reported clusters are an artifact of the algorithm or whether they actually stand out in the data in a statistical sense. We propose a novel problem formulation that aims at extracting axis-parallel regions that stand out in the data in a statistical sense. The set of axis-parallel, statistically significant regions that exist in a given data set is typically highly redundant. Therefore, we formulate the problem of representing this set through a reduced, non-redundant set of axis-parallel, statistically significant regions as an optimization problem. Exhaustive search is not a viable solution due to computational infeasibility, and we propose the approximation algorithm STATPC. Our comprehensive experimental evaluation shows that STATPC significantly outperforms existing projected and subspace clustering algorithms in terms of accuracy.