Less is More: Sparse Graph Mining with Compact Matrix Decomposition

Authors:
Jimeng Sun;Yinglian Xie;Hui Zhang;Christos Faloutsos
Affiliations:
Carnegie Mellon University, USA;Carnegie Mellon University, USA;Carnegie Mellon University, USA;Carnegie Mellon University, USA
Venue:
Statistical Analysis and Data Mining
Year:
2008

Citing 0
Cited 12

Efficient aggregation for graph summarization

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
The Boolean column and column-row matrix decompositions

Data Mining and Knowledge Discovery
JCCM: Joint Cluster Communities on Attribute and Relationship Data in Social Networks

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Uncorrelated multilinear principal component analysis for unsupervised multilinear subspace learning

IEEE Transactions on Neural Networks
A Randomized Algorithm for Principal Component Analysis

SIAM Journal on Matrix Analysis and Applications
Fast Algorithms for Approximating the Singular Value Decomposition

ACM Transactions on Knowledge Discovery from Data (TKDD)
A survey of multilinear subspace learning for tensor data

Pattern Recognition
Efficient topological OLAP on information networks

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Larger residuals, less work: active document scheduling for latent dirichlet allocation

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions

SIAM Review
On the efficiency of an order-based representation in the clique covering problem

Proceedings of the 14th annual conference on Genetic and evolutionary computation
Multi-level Low-rank Approximation-based Spectral Clustering for image segmentation

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given a large sparse graph, how can we find patterns and anomalies? Several important applications can be modeled as large sparse graphs, e.g., network traffic monitoring, research citation network analysis, social network analysis, and financial transactions. Low-rank decompositions, such as singular value decomposition (SVD) and CUR, are powerful techniques for revealing latent-hidden variables and associated patterns from high dimensional data. However, those methods often ignore the sparsity property of the graph, and hence usually incur too high memory and computational cost to be practical. We propose a novel method, the Compact Matrix Decomposition (CMD), to compute sparse low-rank approximations. CMD dramatically reduces both the computation cost and the space requirements over existing decomposition methods singular value decomposition (SVD) and CUR. Using CMD as the key building block, we further propose procedures to efficiently construct and analyze dynamic graphs from real-time application data. We provide theoretical guarantee for our methods, and present results on two real, large datasets, one on network flow data (100 GB trace of 22K hosts over one month) and one on DBLP (200 MB over 25 years). We show that CMD is often an order of magnitude more efficient than the state of the art (SVD and CUR): it is over 10X faster, but requires less than 1-10 of the space, for the same reconstruction accuracy. Finally, we demonstrate how CMD is used for detecting anomalies and monitoring time-evolving graphs, in which it successfully detects worm-like hierarchical scanning patterns in real network data. Copyright © 2007 Wiley Periodicals, Inc., A Wiley Company Statistical Analy Data Mining 1: 000-000, 2007