Discovering pattern-based subspace clusters by pattern tree

Authors:
Jihong Guan;Yanglan Gan;Hao Wang
Affiliations:
Department of Computer Science and Technology, Tongji University, Shanghai 201804, China;Department of Computer Science and Technology, Tongji University, Shanghai 201804, China;Department of Computer Science and Technology, Hefei University of Technology, Hefei 23009, China
Venue:
Knowledge-Based Systems
Year:
2009

Citing 22
Cited 4

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Clustering by pattern similarity in large data sets

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
A Monte Carlo algorithm for fast projective clustering

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
d-Clusters: Capturing Subspace Correlation in a Large Data Set

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Segmenting Customer Transactions Using a Pattern-Based Clustering Approach

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
MaPle: A Fast Algorithm for Maximal Pattern-based Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A Fast Algorithm for Subspace Clustering by Pattern Similarity

SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Cluster Analysis for Gene Expression Data: A Survey

IEEE Transactions on Knowledge and Data Engineering
Revealing True Subspace Clusters in High Dimensions

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Projective Clustering by Histograms

IEEE Transactions on Knowledge and Data Engineering
CLICKS: Mining Subspace Clusters in Categorical Data via K-Partite Maximal Cliques

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
GHIC: A Hierarchical Pattern-Based Clustering Algorithm for Grouping Web Transactions

IEEE Transactions on Knowledge and Data Engineering
Discovering significant OPSM subspace clusters in massive gene expression data

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining coherent patterns from heterogeneous microarray data

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A framework for projected clustering of high dimensional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Survey of clustering algorithms

IEEE Transactions on Neural Networks

Evaluation of analogical proportions through Kolmogorov complexity

Knowledge-Based Systems
BiMine+: An efficient algorithm for discovering relevant biclusters of DNA microarray data

Knowledge-Based Systems
Multi-stage filtering for improving confidence level and determining dominant clusters in clustering algorithms of gene expression data

Computers in Biology and Medicine
Finding multiple global linear correlations in sparse and noisy data sets

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional clustering models based on distance similarity are not always effective in capturing correlation among data objects, while pattern-based clustering can do well in identifying correlation hidden among data objects. However, the state-of-the-art pattern-based clustering methods are inefficient and provide no metric to measure the clustering quality. This paper presents a new pattern-based subspace clustering method, which can tackle the problems mentioned above. Observing the analogy between mining frequent itemsets and discovering subspace clusters, we apply pattern tree - a structure used in frequent itemsets mining to determining the target subspaces by scanning the database once, which can be done efficiently in large datasets. Furthermore, we introduce a general clustering quality evaluation model to guide the identifying of meaningful clusters. The proposed new method enables the users to set flexibly proper quality-control parameters to meet different needs. Experimental results on synthetic and real datasets show that our method outperforms the existing methods in both efficiency and effectiveness.