CURLER: finding and visualizing nonlinear correlation clusters

Authors:
Anthony K. H. Tung;Xin Xu;Beng Chin Ooi
Affiliations:
National University of Singapore;National University of Singapore;National University of Singapore
Venue:
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Year:
2005

Citing 12
Cited 18

Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining the gene expression matrix: inferring gene relationships from large scale gene expression data

IPCAT '97 Proceedings of the second international workshop on Information processing in cell and tissues
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
A Monte Carlo algorithm for fast projective clustering

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Spatial Clustering in the Presence of Obstacles

Proceedings of the 17th International Conference on Data Engineering
Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Computing Clusters of Correlation Connected objects

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data

Deriving quantitative models for correlation clusters

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Robust information-theoretic clustering

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
RIC: Parameter-free noise-robust clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
Outlier-robust clustering using independent components

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A General Framework for Increasing the Robustness of PCA-Based Correlation Clustering Algorithms

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
REDUS: finding reducible subspaces in high dimensional data

Proceedings of the 17th ACM conference on Information and knowledge management
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
A stability based validity method for fuzzy clustering

Pattern Recognition
A fast algorithm for finding correlation clusters in noise data

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Clustering very large multi-dimensional datasets with MapReduce

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
INCONCO: interpretable clustering of numerical and categorical objects

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Employing correlation clustering for the identification of piecewise affine models

Proceedings of the 2011 workshop on Knowledge discovery, modeling and simulation
CLUES: a unified framework supporting interactive exploration of density-based clusters in streams

Proceedings of the 20th ACM international conference on Information and knowledge management
Feature interaction in subspace clustering using the Choquet integral

Pattern Recognition
Dependency clustering across measurement scales

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Compression-aware I/O performance analysis for big data clustering

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Hinging hyperplane models for multiple predicted variables

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
QuMinS: Fast and scalable querying, mining and summarizing multi-modal databases

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

While much work has been done in finding linear correlation among subsets of features in high-dimensional data, work on detecting nonlinear correlation has been left largely untouched. In this paper, we present an algorithm for finding and visualizing nonlinear correlation clusters in the subspace of high-dimensional databases.Unlike the detection of linear correlation in which clusters are of unique orientations, finding nonlinear correlation clusters of varying orientations requires merging clusters of possibly very different orientations. Combined with the fact that spatial proximity must be judged based on a subset of features that are not originally known, deciding which clusters to be merged during the clustering process becomes a challenge. To avoid this problem, we propose a novel concept called co-sharing level which captures both spatial proximity and cluster orientation when judging similarity between clusters. Based on this concept, we develop an algorithm which not only detects nonlinear correlation clusters but also provides a way to visualize them. Experiments on both synthetic and real-life datasets are done to show the effectiveness of our method.