Clustering by pattern similarity in large data sets

Authors:
Haixun Wang;Wei Wang;Jiong Yang;Philip S. Yu
Affiliations:
IBM T. J. Watson Research Center, Hawthorne, NY;IBM T. J. Watson Research Center, Road, Hawthorne, NY;IBM T. J. Watson Research Center, Road, Hawthorne, NY;IBM T. J. Watson Research Center, Road, Hawthorne, NY
Venue:
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Year:
2002

Citing 13
Cited 115

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Social information filtering: algorithms for automating “word of mouth”

CHI '95 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Depth first generation of long patterns

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Knowledge Acquisition Via Incremental Conceptual Clustering

Machine Learning
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Semantic Compression and Pattern Extraction with Fascicles

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
d-Clusters: Capturing Subspace Correlation in a Large Data Set

ICDE '02 Proceedings of the 18th International Conference on Data Engineering

An Index Structure for Pattern Similarity Searching in DNA Microarray Data

CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Clustering gene expression data in SQL using locally adaptive metrics

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
OP-Cluster: Clustering by Tendency in High Dimensional Space

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Segmenting Customer Transactions Using a Pattern-Based Clustering Approach

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
MaPle: A Fast Algorithm for Maximal Pattern-based Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
ADMiRe: an algebraic approach to system performance analysis using data mining techniques

Proceedings of the 2003 ACM symposium on Applied computing
Interactive exploration of coherent patterns in time-series gene expression data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Substructure Clustering on Sequential 3d Object Datasets

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Towards interactive exploration of gene expression patterns

ACM SIGKDD Explorations Newsletter
Computing Clusters of Correlation Connected objects

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Mining coherent gene clusters from gene-sample-time microarray data

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A framework for ontology-driven subspace clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Sleeved coclustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Biclustering Algorithms for Biological Data Analysis: A Survey

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Cluster Analysis for Gene Expression Data: A Survey

IEEE Transactions on Knowledge and Data Engineering
HARP: A Practical Projected Clustering Algorithm

IEEE Transactions on Knowledge and Data Engineering
Iterative Projected Clustering by Subspace Mining

IEEE Transactions on Knowledge and Data Engineering
On Discovery of Extremely Low-Dimensional Clusters Using Semi-Supervised Projected Clustering

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Mining Cross-Graph Quasi-Cliques in Gene Expression and Protein Interaction Data

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
ADMiRe: An Algebraic Data Mining Approach to System Performance Analysis

IEEE Transactions on Knowledge and Data Engineering
GHIC: A Hierarchical Pattern-Based Clustering Algorithm for Grouping Web Transactions

IEEE Transactions on Knowledge and Data Engineering
Dimension induced clustering

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
On mining cross-graph quasi-cliques

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Pattern-based similarity search for microarray data

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
An Interactive Approach to Mining Gene Expression Data

IEEE Transactions on Knowledge and Data Engineering
Discovering Coherent Biclusters from Gene Expression Data Using Zero-Suppressed Binary Decision Diagrams

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Biclustering of Expression Data with Evolutionary Computation

IEEE Transactions on Knowledge and Data Engineering
MicroCluster: Efficient Deterministic Biclustering of Microarray Data

IEEE Intelligent Systems
Deriving quantitative models for correlation clusters

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering significant OPSM subspace clusters in massive gene expression data

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining coherent patterns from heterogeneous microarray data

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Finding biclusters by random projections

Theoretical Computer Science
Locally adaptive metrics for clustering high dimensional data

Data Mining and Knowledge Discovery
Analysis and summarization of correlations in data cubes and its application in microarray data analysis

Intelligent Data Analysis
Out-of-core coherent closed quasi-clique mining from large dense graph databases

ACM Transactions on Database Systems (TODS)
A multi-objective approach to discover biclusters in microarray data

Proceedings of the 9th annual conference on Genetic and evolutionary computation
Possibilistic approach for biclustering microarray data

Computers in Biology and Medicine
DNA microarray data analysis: a novel biclustering algorithm approach

EURASIP Journal on Applied Signal Processing
A novel approach to revealing positive and negative co-regulated genes

Journal of Computer Science and Technology
Continuous subspace clustering in streaming time series

Information Systems
High parallelism, portability, and broad accessibility: Technologies for genomics

ACM Journal on Emerging Technologies in Computing Systems (JETC)
Maximal Subspace Coregulated Gene Clustering

IEEE Transactions on Knowledge and Data Engineering
Biclustering in data mining

Computers and Operations Research
Discovering Biclusters by Iteratively Sorting with Weighted Correlation Coefficient in Gene Expression Data

Journal of Signal Processing Systems
On mining micro-array data by Order-Preserving Submatrix

International Journal of Bioinformatics Research and Applications
A hierarchical model-based approach to co-clustering high-dimensional data

Proceedings of the 2008 ACM symposium on Applied computing
A survey on algorithms for mining frequent itemsets over data streams

Knowledge and Information Systems
An efficient hierarchical clustering model for grouping web transactions

International Journal of Business Intelligence and Data Mining
A Coding Hierarchy Computing Based Clustering Algorithm

ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
ELKI: A Software System for Evaluation of Subspace Clustering Algorithms

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Automatic detection of cohesive subgroups within social hypertext: A heuristic approach

The New Review of Hypermedia and Multimedia
Detecting clusters in moderate-to-high dimensional data: subspace clustering, pattern-based clustering, and correlation clustering

Proceedings of the VLDB Endowment
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
Intelligent system for the analysis of microarray data using principal components and estimation of distribution algorithms

Expert Systems with Applications: An International Journal
Clustering by pattern similarity

Journal of Computer Science and Technology
A Biclustering Method to Discover Co-regulated Genes Using Diverse Gene Expression Datasets

BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
A scalable framework for discovering coherent co-clusters in noisy data

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A semi-supervised approach to projected clustering with applications to microarray data

International Journal of Data Mining and Bioinformatics
A probabilistic relaxation labeling framework for reducing the noise effect in geometric biclustering of gene expression data

Pattern Recognition
Agent-Based Non-distributed and Distributed Clustering

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Discovering pattern-based subspace clusters by pattern tree

Knowledge-Based Systems
Indexing 3-D human motion repositories for content-based retrieval

IEEE Transactions on Information Technology in Biomedicine - Special section on computational intelligence in medical systems
An outlook on design technologies for future integrated systems

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Efficiently mining local conserved clusters from gene expression data

Neurocomputing
Virtual error: a new measure for evolutionary biclustering

EvoBIO'07 Proceedings of the 5th European conference on Evolutionary computation, machine learning and data mining in bioinformatics
Clustering zebrafish genes based on frequent-itemsets and frequency levels

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Identifying synchronous and asynchronous co-regulations from time series gene expression data

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Mining time-shifting co-regulation patterns from gene expression data

APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Frequent variable sets based clustering for artificial neural networks particle classification

APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Applying biclustering to text mining: an immune-inspired approach

ICARIS'07 Proceedings of the 6th international conference on Artificial immune systems
Meta learning intrusion detection in real time network

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Mining bi-sets in numerical data

KDID'06 Proceedings of the 5th international conference on Knowledge discovery in inductive databases
Biclusters evaluation based on shifting and scaling patterns

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Order preserving clustering by finding frequent orders in gene expression data

PRIB'07 Proceedings of the 2nd IAPR international conference on Pattern recognition in bioinformatics
Detection and visualization of subspace cluster hierarchies

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Asymmetric and context-dependent semantic similarity among ontology instances

Journal on data semantics X
Microarray data biclustering with multi-objective immune algorithm

ICNC'09 Proceedings of the 5th international conference on Natural computation
Efficiently mining time-delayed gene expression patterns

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values

Journal of Biomedical Informatics
Comparative analysis of biclustering algorithms

Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
Measuring the quality of shifting and scaling patterns in biclusters

PRIB'10 Proceedings of the 5th IAPR international conference on Pattern recognition in bioinformatics
Noise-robust algorithm for identifying functionally associated biclusters from gene expression data

Information Sciences: an International Journal
An evolutionary approach for biclustering of gene expression data

International Journal of Bio-Inspired Computation
A review on time series data mining

Engineering Applications of Artificial Intelligence
Bi-k-bi clustering: mining large scale gene expression data using two-level biclustering

International Journal of Data Mining and Bioinformatics
Gene expression network discovery: a pattern based biclustering approach

Proceedings of the 2011 International Conference on Communication, Computing & Security
BARTMAP: A viable structure for biclustering

Neural Networks
Discovering non-exclusive functional modules from gene expression data

International Journal of Information and Communication Technology
Simultaneous clustering: a survey

PReMI'11 Proceedings of the 4th international conference on Pattern recognition and machine intelligence
Subspace clustering of microarray data based on domain transformation

VDMB'06 Proceedings of the First international conference on Data Mining and Bioinformatics
RIVA: indexing and visualization of high-dimensional data via dimension reorderings

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Mining maximal correlated member clusters in high dimensional database

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Finding similar patterns in microarray data

AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
An effective measure for assessing the quality of biclusters

Computers in Biology and Medicine
Obtaining biclusters in microarrays with population-based heuristics

EuroGP'06 Proceedings of the 2006 international conference on Applications of Evolutionary Computing
Parallelized Evolutionary Learning for Detection of Biclusters in Gene Expression Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Mining maximal local conserved gene clusters from microarray data

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Mining time-delayed coherent patterns in time series gene expression data

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Shifting patterns discovery in microarrays with evolutionary algorithms

KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
Building the data warehouse of frequent itemsets in the DWFIST approach

ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
Evolutionary biclustering of microarray data

EC'05 Proceedings of the 3rd European conference on Applications of Evolutionary Computing
A general approach to mining quality pattern-based clusters from microarray data

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Integrating heterogeneous microarray data sources using correlation signatures

DILS'05 Proceedings of the Second international conference on Data Integration in the Life Sciences
Spatio-temporal similarity analysis between trajectories on road networks

ER'05 Proceedings of the 24th international conference on Perspectives in Conceptual Modeling
Continuously identifying representatives out of massive streams

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Clustering in applications with multiple data sources-A mutual subspace clustering approach

Neurocomputing
Mining same-taste users with common preference patterns for ubiquitous exhibition navigation

ACIIDS'12 Proceedings of the 4th Asian conference on Intelligent Information and Database Systems - Volume Part III
Privacy-preserving frequent itemsets mining via secure collaborative framework

Security and Communication Networks
A unified adaptive co-identification framework for high-d expression data

PRIB'12 Proceedings of the 7th IAPR international conference on Pattern Recognition in Bioinformatics
A survey on enhanced subspace clustering

Data Mining and Knowledge Discovery
Mining succinct and high-coverage API usage patterns from source code

Proceedings of the 10th Working Conference on Mining Software Repositories
CoBi: Pattern Based Co-Regulated Biclustering of Gene Expression Data

Pattern Recognition Letters
Data clustering based on correlation analysis applied to highly variable domains

Computer Networks: The International Journal of Computer and Telecommunications Networking
A new measure for gene expression biclustering based on non-parametric correlation

Computer Methods and Programs in Biomedicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is the process of grouping a set of objects into classes of similar objects. Although definitions of similarity vary from one clustering model to another, in most of these models the concept of similarity is based on distances, e.g., Euclidean distance or cosine distance. In other words, similar objects are required to have close values on at least a set of dimensions. In this paper, we explore a more general type of similarity. Under the pCluster model we proposed, two objects are similar if they exhibit a coherent pattern on a subset of dimensions. For instance, in DNA microarray analysis, the expression levels of two genes may rise and fall synchronously in response to a set of environmental stimuli. Although the magnitude of their expression levels may not be close, the patterns they exhibit can be very much alike. Discovery of such clusters of genes is essential in revealing significant connections in gene regulatory networks. E-commerce applications, such as collaborative filtering, can also benefit from the new model, which captures not only the closeness of values of certain leading indicators but also the closeness of (purchasing, browsing, etc.) patterns exhibited by the customers. Our paper introduces an effective algorithm to detect such clusters, and we perform tests on several real and synthetic data sets to show its effectiveness.