A Fast Algorithm for Subspace Clustering by Pattern Similarity

Authors:
Haixun Wang;Fang Chu;Wei Fan;Philip S. Yu;Jian Pei
Affiliations:
IBM T.J. Watson Research Center;Univ. of California, Los Angeles;IBM T.J. Watson Research Center;IBM T.J. Watson Research Center;SUNY Buffalo
Venue:
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Year:
2004

Citing 0
Cited 8

A novel approach to revealing positive and negative co-regulated genes

Journal of Computer Science and Technology
Continuous subspace clustering in streaming time series

Information Systems
Maximal Subspace Coregulated Gene Clustering

IEEE Transactions on Knowledge and Data Engineering
On mining micro-array data by Order-Preserving Submatrix

International Journal of Bioinformatics Research and Applications
Discovering pattern-based subspace clusters by pattern tree

Knowledge-Based Systems
Efficiently mining local conserved clusters from gene expression data

Neurocomputing
CoBi: Pattern Based Co-Regulated Biclustering of Gene Expression Data

Pattern Recognition Letters
GPUMAFIA: efficient subspace clustering with MAFIA on GPUs

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Unlike traditional clustering methods that focus ongrouping objects with similar values on a set of dimensions,clustering by pattern similarity finds objects thatexhibit a coherent pattern of rise and fall in subspaces.Pattern-based clustering extends the concept of traditional clustering and bene ts a wide range of applications, including large scale scientific data analysis, targetmarketing, web usage analysis, etc. However, state-of-the-art pattern-based clustering methods (e.g., the pCluster algorithm) can only handle datasets of thousands ofrecords, which makes them inappropriate for many real-life applications. Furthermore, besides the huge data volume, many data sets are also characterized by their sequentiality, for instance, customer purchase records andnetwork event logs are usually modeled as data sequences.Hence, it becomes important to enable pattern-based clustering methods i) to handle large datasets, and ii) to discover pattern similarity embedded in data sequences.In this paper, we present a novel algorithm that offersthis capability. Experimental results from both real lifeand synthetic datasets prove its effectiveness and efficiency.