Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Clustering by pattern similarity in large data sets
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
A Monte Carlo algorithm for fast projective clustering
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Biclustering of Expression Data
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
d-Clusters: Capturing Subspace Correlation in a Large Data Set
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Segmenting Customer Transactions Using a Pattern-Based Clustering Approach
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
MaPle: A Fast Algorithm for Maximal Pattern-based Clustering
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A Fast Algorithm for Subspace Clustering by Pattern Similarity
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Subspace clustering for high dimensional data: a review
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Cluster Analysis for Gene Expression Data: A Survey
IEEE Transactions on Knowledge and Data Engineering
Revealing True Subspace Clusters in High Dimensions
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Projective Clustering by Histograms
IEEE Transactions on Knowledge and Data Engineering
CLICKS: Mining Subspace Clusters in Categorical Data via K-Partite Maximal Cliques
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
GHIC: A Hierarchical Pattern-Based Clustering Algorithm for Grouping Web Transactions
IEEE Transactions on Knowledge and Data Engineering
Discovering significant OPSM subspace clusters in massive gene expression data
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining coherent patterns from heterogeneous microarray data
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A framework for projected clustering of high dimensional data streams
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Survey of clustering algorithms
IEEE Transactions on Neural Networks
Evaluation of analogical proportions through Kolmogorov complexity
Knowledge-Based Systems
BiMine+: An efficient algorithm for discovering relevant biclusters of DNA microarray data
Knowledge-Based Systems
Computers in Biology and Medicine
Finding multiple global linear correlations in sparse and noisy data sets
Knowledge-Based Systems
Hi-index | 0.00 |
Traditional clustering models based on distance similarity are not always effective in capturing correlation among data objects, while pattern-based clustering can do well in identifying correlation hidden among data objects. However, the state-of-the-art pattern-based clustering methods are inefficient and provide no metric to measure the clustering quality. This paper presents a new pattern-based subspace clustering method, which can tackle the problems mentioned above. Observing the analogy between mining frequent itemsets and discovering subspace clusters, we apply pattern tree - a structure used in frequent itemsets mining to determining the target subspaces by scanning the database once, which can be done efficiently in large datasets. Furthermore, we introduce a general clustering quality evaluation model to guide the identifying of meaningful clusters. The proposed new method enables the users to set flexibly proper quality-control parameters to meet different needs. Experimental results on synthetic and real datasets show that our method outperforms the existing methods in both efficiency and effectiveness.