MaPle: A Fast Algorithm for Maximal Pattern-based Clustering

Authors:
Jian Pei;Xiaoling Zhang;Moonjung Cho;Haixun Wang;Philip S. Yu
Affiliations:
-;-;-;-;-
Venue:
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Year:
2003

Citing 11
Cited 32

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Clustering by pattern similarity in large data sets

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
When Is ''Nearest Neighbor'' Meaningful?

ICDT '99 Proceedings of the 7th International Conference on Database Theory
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Semantic Compression and Pattern Extraction with Fascicles

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases

Towards interactive exploration of gene expression patterns

ACM SIGKDD Explorations Newsletter
Computing Clusters of Correlation Connected objects

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Mining coherent gene clusters from gene-sample-time microarray data

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
HARP: A Practical Projected Clustering Algorithm

IEEE Transactions on Knowledge and Data Engineering
Iterative Projected Clustering by Subspace Mining

IEEE Transactions on Knowledge and Data Engineering
Mining top-K covering rule groups for gene expression data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
ExAnte: A Preprocessing Method for Frequent-Pattern Mining

IEEE Intelligent Systems
An Interactive Approach to Mining Gene Expression Data

IEEE Transactions on Knowledge and Data Engineering
Biclustering of Expression Data with Evolutionary Computation

IEEE Transactions on Knowledge and Data Engineering
MicroCluster: Efficient Deterministic Biclustering of Microarray Data

IEEE Intelligent Systems
Deriving quantitative models for correlation clusters

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Soft constraint based pattern mining

Data & Knowledge Engineering
A novel approach to revealing positive and negative co-regulated genes

Journal of Computer Science and Technology
Continuous subspace clustering in streaming time series

Information Systems
Maximal Subspace Coregulated Gene Clustering

IEEE Transactions on Knowledge and Data Engineering
On mining micro-array data by Order-Preserving Submatrix

International Journal of Bioinformatics Research and Applications
Detecting clusters in moderate-to-high dimensional data: subspace clustering, pattern-based clustering, and correlation clustering

Proceedings of the VLDB Endowment
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
A semi-supervised approach to projected clustering with applications to microarray data

International Journal of Data Mining and Bioinformatics
Discovering pattern-based subspace clusters by pattern tree

Knowledge-Based Systems
Efficiently mining local conserved clusters from gene expression data

Neurocomputing
Identifying synchronous and asynchronous co-regulations from time series gene expression data

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Mining time-shifting co-regulation patterns from gene expression data

APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Bi-k-bi clustering: mining large scale gene expression data using two-level biclustering

International Journal of Data Mining and Bioinformatics
WF-MSB: A weighted fuzzy-based biclustering method for gene expression data

International Journal of Data Mining and Bioinformatics
Gene expression network discovery: a pattern based biclustering approach

Proceedings of the 2011 International Conference on Communication, Computing & Security
Discovering non-exclusive functional modules from gene expression data

International Journal of Information and Communication Technology
Pushing constraints to detect local patterns

LPD'04 Proceedings of the 2004 international conference on Local Pattern Detection
A relational query primitive for constraint-based pattern mining

Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases
A general approach to mining quality pattern-based clusters from microarray data

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Clustering in applications with multiple data sources-A mutual subspace clustering approach

Neurocomputing
CoBi: Pattern Based Co-Regulated Biclustering of Gene Expression Data

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Pattern-based clustering is important in many applications,such as DNA micro-array data analysis, automaticrecommendation systems and target marketing systems.However, pattern-based clustering in large databasesis challenging. On the one hand, there can be a huge numberof clusters and many of them can be redundant and thusmake the pattern-based clustering ineffective. On the otherhand, the previous proposed methods may not be efficient orscalable in mining large databases.In this paper, we study the problem of maximal pattern-basedclustering. Redundant clusters are avoided completelyby mining only the maximal pattern-based clusters.MaPle, an efficient and scalable mining algorithm is developed.It conducts a depth-first, divide-and-conquer searchand prunes unnecessary branches smartly. Our extensiveperformance study on both synthetic data sets and real datasets shows that maximal pattern-based clustering is effective.It reduces the number of clusters substantially. Moreover,MaPle is more efficient and scalable than the previouslyproposed pattern-based clustering methods in mininglarge databases.