SCHISM: a new approach to interesting subspace mining

Authors:
Karlton Sequeira;Mohammed Zaki
Affiliations:
Department of Computer Science, Rensselaer Polytechnic Institute, 110 8th St, Troy, NY 12180, USA.;Department of Computer Science, Rensselaer Polytechnic Institute, 110 8th St, Troy, NY 12180, USA
Venue:
International Journal of Business Intelligence and Data Mining
Year:
2005

Citing 12
Cited 1

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A Monte Carlo algorithm for fast projective clustering

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Efficiently Mining Maximal Frequent Itemsets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Selecting the right interestingness measure for association patterns

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive dimension reduction for clustering high dimensional data

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Towards systematic design of distance functions for data mining applications

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques

Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)

Quantified Score

Hi-index	0.00

Visualization

Abstract

High dimensional data pose challenges to traditional clustering algorithms due to their inherent sparseness and data tend to cluster in different and possibly overlapping subspaces of the entire feature space. Finding such subspaces is called subspace mining. We present SCHISM, a new algorithm for mining interesting subspaces, using the notions of support and Chernoff-Hoeffding bounds. We use a vertical representation of the dataset, and use a depth first search with backtracking to find maximal interesting subspaces. We test our algorithm on a number of high dimensional synthetic and real datasets to test its effectiveness.