Finding multiple global linear correlations in sparse and noisy data sets

Authors:
Shunzhi Zhu;Liang Tang;Tao Li
Affiliations:
-;-;-
Venue:
Knowledge-Based Systems
Year:
2013

Citing 26
Cited 0

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Clustering through decision tree construction

Proceedings of the ninth international conference on Information and knowledge management
Computer Vision

Computer Vision
A Monte Carlo algorithm for fast projective clustering

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
d-Clusters: Capturing Subspace Correlation in a Large Data Set

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Efficient multi-way text categorization via generalized discriminant analysis

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Computing Clusters of Correlation Connected objects

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Deriving quantitative models for correlation clusters

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Hierarchies of Correlation Clusters

SSDBM '06 Proceedings of the 18th International Conference on Scientific and Statistical Database Management
P3C: A Robust Projected Clustering Algorithm

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
On Exploring Complex Relationships of Correlation Clusters

SSDBM '07 Proceedings of the 19th International Conference on Scientific and Statistical Database Management
DUSC: Dimensionality Unbiased Subspace Clustering

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Global Correlation Clustering Based on the Hough Transform

Statistical Analysis and Data Mining
SLICE: A Novel Method to Find Local Linear Correlations by Constructing Hyperplanes

APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
CARE: Finding Local Linear Correlations in High Dimensional Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
K-Subspace Clustering

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Discovering pattern-based subspace clusters by pattern tree

Knowledge-Based Systems
Motion segmentation with missing data using power factorization and GPCA

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Minimum effective dimension for mixtures of subspaces: a robust GPCA algorithm and its applications

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Generalized principal component analysis (GPCA)

CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding linear correlations is an important research problem with numerous real-world applications. In real-world data sets, linear correlation may not exist in the entire data set. Some linear correlations are only visible in certain data subsets. On one hand, a lot of local correlation clustering algorithms assume that the data points of a linear correlation are locally dense. These methods may miss some global correlations when data points are sparsely distributed. On the other hand, existing global correlation clustering methods may fail when the data set contains a large amount of non-correlated points or the actual correlations are coarse. This paper proposes a simple and fast algorithm DCSearch for finding multiple global linear correlations in a data set. This algorithm is able to find the coarse and global linear correlation in noisy and sparse data sets. By using the classical divide and conquer strategy, it first divides the data set into subsets to reduce the search space, and then recursively searches and prunes the candidate correlations from the subsets. Empirical studies show that DCSearch can efficiently reduce the number of candidate correlations during each iteration. Experimental results on both synthetic and real data sets demonstrate that DCSearch is effective and efficient in finding global linear correlations in sparse and noisy data sets.