Deriving quantitative models for correlation clusters

Authors:
Elke Achtert;Christian Böhm;Hans-Peter Kriegel;Peer Kröger;Arthur Zimek
Affiliations:
Ludwig-Maximilians-Universität München, Munich, Germany;Ludwig-Maximilians-Universität München, Munich, Germany;Ludwig-Maximilians-Universität München, Munich, Germany;Ludwig-Maximilians-Universität München, Munich, Germany;Ludwig-Maximilians-Universität München, Munich, Germany
Venue:
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2006

Citing 19
Cited 15

Mining quantitative association rules in large relational tables

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Discovering associations with numeric variables

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering by pattern similarity in large data sets

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
A Monte Carlo algorithm for fast projective clustering

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
d-Clusters: Capturing Subspace Correlation in a Large Data Set

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
OP-Cluster: Clustering by Tendency in High Dimensional Space

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
MaPle: A Fast Algorithm for Maximal Pattern-based Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Computing Clusters of Correlation Connected objects

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Density Connected Clustering with Local Subspace Preferences

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Quantitative Association Rules Based on Half-Spaces: An Optimization Approach

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Cluster Cores-Based Clustering for High Dimensional Data

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
CURLER: finding and visualizing nonlinear correlation clusters

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Analyzing microarray data using quantitative association rules

Bioinformatics

Trajectory clustering: a partition-and-group framework

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Detecting clusters in moderate-to-high dimensional data: subspace clustering, pattern-based clustering, and correlation clustering

Proceedings of the VLDB Endowment
Finding regional co-location patterns for sets of continuous variables in spatial datasets

Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
SLICE: A Novel Method to Find Local Linear Correlations by Constructing Hyperplanes

APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Regional Pattern Discovery in Geo-referenced Datasets Using PCA

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Correlation clustering

ACM SIGKDD Explorations Newsletter
Minimum variance associations: discovering relationships in numerical data

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
INCONCO: interpretable clustering of numerical and categorical objects

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Employing correlation clustering for the identification of piecewise affine models

Proceedings of the 2011 workshop on Knowledge discovery, modeling and simulation
Subspace clustering

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Subspace correlation clustering: finding locally correlated dimensions in subspace projections of the data

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A survey on unsupervised outlier detection in high-dimensional numerical data

Statistical Analysis and Data Mining
A survey on enhanced subspace clustering

Data Mining and Knowledge Discovery
Finding multiple global linear correlations in sparse and noisy data sets

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Correlation clustering aims at grouping the data set into correlation clusters such that the objects in the same cluster exhibit a certain density and are all associated to a common arbitrarily oriented hyperplane of arbitrary dimensionality. Several algorithms for this task have been proposed recently. However, all algorithms only compute the partitioning of the data into clusters. This is only a first step in the pipeline of advanced data analysis and system modelling. The second (post-clustering) step of deriving a quantitative model for each correlation cluster has not been addressed so far. In this paper, we describe an original approach to handle this second step. We introduce a general method that can extract quantitative information on the linear dependencies within a correlation clustering. Our concepts are independent of the clustering model and can thus be applied as a post-processing step to any correlation clustering algorithm. Furthermore, we show how these quantitative models can be used to predict the probability distribution that an object is created by these models. Our broad experimental evaluation demonstrates the beneficial impact of our method on several applications of significant practical importance.